xsl-list
[Top] [All Lists]

Re: xsl:sort with msxml english language, danish characters, weird results

2004-10-25 08:50:42
Michael Kay wrote:

The UCA is written as if there is a single correct
answer, but there isn't.

The UCA doesn't define a particular collation sequence for any languages, rather it defines the requirements for how collation mechanisms should allow you to define the collation rules for a given language and script. The Unicode standard is very clear that collation is highly variable and that there is no single answer for any language or script. [Even for a single language you might have different collation rules for glossaries and indexes, for example.]

Java's built-in RuleBasedCollator class implements a collation mechanism that, as far as I know, conforms to the Unicode UCA in that it provides the functionality needed (althought it may not fully address issues of how to handle composed and uncomposed characters--I'm not sure about the details there). The IBM ICU package provides a more complete implementation of the UCA and the ICU4J package provides an alternative set of built-in language-specific collators that are more complete and accurate than those shipped with Java.

Cheers,

E.
--
W. Eliot Kimber
Professional Services
Innodata Isogen
9390 Research Blvd, #410
Austin, TX 78759
(512) 372-8122

eliot(_at_)innodata-isogen(_dot_)com
www.innodata-isogen.com