xsl-list
[Top] [All Lists]

RE: xsl:sort with msxml english language, danish characters, weird results

2004-10-25 05:26:07
What are
the rules for accenting? I suppose that most people if you 
asked them what ø was
would say that's an o with a slash through it, and an æ was 
an a and an e stuck
really close together, hence mnemonic entities, but is that 
the rule for
determining what is an accented character? We asked 100 
people and 90 gave the
following answer? 

I used the term "accent" very loosely. For the full gory detail, see the
Unicode Collation Algorithm [1]. I don't know if Microsoft follow this
precisely, but they are probably using the same principles.

As for how they collected the data - yes, they probably asked a few
non-randomly selected people, and they looked in some (possibly out of date)
textbooks, and when they got it badly wrong people complained and they
sometimes fixed it. There isn't a single right answer - different publishers
sort their dictionaries and indexes and phone books in different ways, and
none of them is wrong. The UCA is written as if there is a single correct
answer, but there isn't.

Michael Kay
http://www.saxonica.com/

[1] http://www.unicode.org/reports/tr10/