Michael Kay wrote:
I'm not sure I'm following here--at least using Java
RuleBasedCollator
you should be able to achieve any collation sequence whatsoever.
But I'm not sure what you mean by sorting 646 before 10646.
A possible algorithm is that any sequence of digits counts as a single
collation unit, which is collated before the first collation unit derived
from non-digit characters, and has a collation value equal to its decimal
value.
I don't believe you can achieve this with a RuleBasedCollator.
Ah, I understand now--I misunderstood your comment as being about the
standards, not the strings "646" and "10646".
I think you are correct, although I'll have to test it.
Of course, this type of rule can be implemented using a custom
Comparator implementation that implements whatever rule you want,
delegating the character-level comparison to a rule-based collator. I
don't think there's any way that a purely declarative mechanism, which
is what I understand the UCA to define (and what RuleBasedCollator
implements) to handle all cases.
Cheers,
E.
--
W. Eliot Kimber
Professional Services
Innodata Isogen
9390 Research Blvd, #410
Austin, TX 78759
(512) 372-8122
eliot(_at_)innodata-isogen(_dot_)com
www.innodata-isogen.com