Ok, this is in line with what how I understood this paragraph in
perluniintro:
The short answer is that by default, Perl compares strings
("lt",
"le", "cmp", "ge", "gt") based only on the code points of
the char-
acters. In the above case, the answer is "after", since
0x00C1 >
0x00C0.
So is it just by chance that these French words are accurately sorted?
I think a "qualified yes" here is in order...
% perl -Mutf8 -e 'binmode(STDOUT, ":utf8"); print join " ", sort
qw(côte côté cote coté)'
cote coté côte côté
Is this the famous French "backwards accents" rule in action?
(http://www-clips.imag.fr/geta/gilles.serasset/tri-du-francais.html)
(no, I don't speak French)
But in this case, with those particular words, I think ISO Latin 1 (none
of the characters are beyond ISO Latin 1) just "happens" to work right.
o < ô, and e < é.
Some more links (database related since they have had to think about
these things
for years already) that hopefully explain some of the problems related
to "linguistic sorting":
http://www.engin.umich.edu/caen/wls/software/oracle/server.901/a90236/
ch4.htm
http://developer.mimer.com/documentation/html_92/
Mimer_SQL_Engine_DocSet/Mimer_Concepts14.html
Thanks,
--
Eric Cholet
--
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> http://www.iki.fi/jhi/ "There is this
special
biologist word we use for 'stable'. It is 'dead'." -- Jack Cohen