Re: Unicode::Collate question

Le 1 déc. 03, à 16:46, Jarkko Hietaniemi a écrit :

Thank you both for your replies. What about sorting words in oneparticularlanguage, is Perl's sort() good enough? I'm wondering, since languageisn't
one of sort()'s arguments.
First we need to define "good enough"... again, if you are sorting
"simple" English or Hawaiian, you are probably fine.  But as soon
as your "words" contain real-life complications like

        - letters like é or or ö or æ or ...
- beyond-Latin-1-letters like Ă or Ł or Б or א or अ or ぁ or ... -peoples' names
        - acronyms and the like
        - do all the characters matter or just the letters
        - sorting mixed letters and digits
        - Roman numbers
you are on your own. For the first item the use of the locale pragmacan helpas long as your data is 8-bit and in one locale. As soon as databecomes Unicode,
Perl will as far as I know ignore localeness for sorting.
If you find yourself wanting some complex sorting, look into CPAN,what youcan find from search.cpan.org with "sort", for example Sort::ArbBiLexmight
be useful.

Ok, this is in line with what how I understood this paragraph inperluniintro:

The short answer is that by default, Perl compares strings("lt","le", "cmp", "ge", "gt") based only on the code points ofthe char-acters. In the above case, the answer is "after", since0x00C1 >

           0x00C0.

So is it just by chance that these French words are accurately sorted?

% perl -Mutf8 -e 'binmode(STDOUT, ":utf8"); print join " ", sortqw(côte côté cote coté)'

cote coté côte côté

Thanks,
--
Eric Cholet