Re: Unicode::Collate question

Ok, this is in line with what how I understood this paragraph inperluniintro:
The short answer is that by default, Perl compares strings("lt","le", "cmp", "ge", "gt") based only on the code points ofthe char-acters. In the above case, the answer is "after", since0x00C1 >
           0x00C0.

So is it just by chance that these French words are accurately sorted?


I think a "qualified yes" here is in order...

% perl -Mutf8 -e 'binmode(STDOUT, ":utf8"); print join " ", sortqw(côte côté cote coté)'
cote coté côte côté


Is this the famous French "backwards accents" rule in action?
(http://www-clips.imag.fr/geta/gilles.serasset/tri-du-francais.html)
(no, I don't speak French)

But in this case, with those particular words, I think ISO Latin 1 (none
of the characters are beyond ISO Latin 1) just "happens" to work right.
o < ô, and e < é.

Some more links (database related since they have had to think aboutthese thingsfor years already) that hopefully explain some of the problems relatedto "linguistic sorting":

http://www.engin.umich.edu/caen/wls/software/oracle/server.901/a90236/ch4.htmhttp://developer.mimer.com/documentation/html_92/Mimer_SQL_Engine_DocSet/Mimer_Concepts14.html


Thanks,
--
Eric Cholet

--

Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> http://www.iki.fi/jhi/ "There is thisspecial

biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen