David Carlisle wrote:
Dr. Johnson and every lexicographer since has used case as the least
significant, most rapidly varying element in ordering. The example I
have in front of me from the Concise Oxford Dictionary lists daily -
Dalmatian - dalmatic and I would not expect it to do anything else.
Dictionaries are not really a good example to follow here as they don't
have to deal with all strings, it probably doesn't list
DAILY or dalmatioN at all, but xsl:sort has to deal with these things.
I haven't seen anyone mention that in the general case it is not
possible for any XSLT implementation to define the appropriate collation
rules for all possible uses of sort--the variance even within a single
language is too great, as evidenced by, for example, the discussion of
back-of-the-book index sorting in the _Chicago Manual of Style_. In
addition, the Unicode standard is very clear that the ordering of
characters in the Unicode character set does not define the collation
sequence for any language or writing system. While most alphabetic
languages have a natural or default collation order, sylabic and
ideographic languages mostly do not.
For example, Simplified Chinese is collated in terms of its pin-yin
transliteration. That is, a character transliterated as "pi" would sort
under "p". But there is no universal agreement about what the
transliteration of every character is--some authorities might
transliterate "pi" as "bi", for example.
Not to mention that collation rules could vary within a single document.
For example, the index might use one set of rules (for example, ignoring
punctuation and spaces) while a generated glossary or parts list
respects them.
Any XSLT implementation that does not provide a way for users to easily
integrate custom collators will not be useful for a number of important
use cases, including producing back-of-the-book indexes. In particular,
any application that needs to do culturally- and editorially-appropriate
collation in non-Western lanuages (essentially the languages and locales
for which Java does not currently provide appropriate Collator
implementations) will only be able use XSLT processors that provide a
way to specify custom collators.
As far as know, only Saxon provides this facility today (although I
haven't looked into MS-XSL's extension facilities since all my work is
done in Java).
Cheers,
Eliot
--
W. Eliot Kimber, eliot(_at_)isogen(_dot_)com
Consultant, ISOGEN International
1016 La Posada Dr., Suite 240
Austin, TX 78752 Phone: 512.656.4139
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list