What are
the rules for accenting? I suppose that most people if you
asked them what ø was
would say that's an o with a slash through it, and an æ was
an a and an e stuck
really close together, hence mnemonic entities, but is that
the rule for
determining what is an accented character? We asked 100
people and 90 gave the
following answer?
I used the term "accent" very loosely. For the full gory detail, see the
Unicode Collation Algorithm [1]. I don't know if Microsoft follow this
precisely, but they are probably using the same principles.
As for how they collected the data - yes, they probably asked a few
non-randomly selected people, and they looked in some (possibly out of date)
textbooks, and when they got it badly wrong people complained and they
sometimes fixed it. There isn't a single right answer - different publishers
sort their dictionaries and indexes and phone books in different ways, and
none of them is wrong. The UCA is written as if there is a single correct
answer, but there isn't.
Michael Kay
http://www.saxonica.com/
[1] http://www.unicode.org/reports/tr10/