Le 1 déc. 03, à 16:46, Jarkko Hietaniemi a écrit :
Thank you both for your replies. What about sorting words in one
particular
language, is Perl's sort() good enough? I'm wondering, since language
isn't
one of sort()'s arguments.
First we need to define "good enough"... again, if you are sorting
"simple" English or Hawaiian, you are probably fine. But as soon
as your "words" contain real-life complications like
- letters like é or or ö or æ or ...
- beyond-Latin-1-letters like Ă or Ł or Б or א or अ or ぁ or ... -
peoples' names
- acronyms and the like
- do all the characters matter or just the letters
- sorting mixed letters and digits
- Roman numbers
you are on your own. For the first item the use of the locale pragma
can help
as long as your data is 8-bit and in one locale. As soon as data
becomes Unicode,
Perl will as far as I know ignore localeness for sorting.
If you find yourself wanting some complex sorting, look into CPAN,
what you
can find from search.cpan.org with "sort", for example Sort::ArbBiLex
might
be useful.
Ok, this is in line with what how I understood this paragraph in
perluniintro:
The short answer is that by default, Perl compares strings
("lt",
"le", "cmp", "ge", "gt") based only on the code points of
the char-
acters. In the above case, the answer is "after", since
0x00C1 >
0x00C0.
So is it just by chance that these French words are accurately sorted?
% perl -Mutf8 -e 'binmode(STDOUT, ":utf8"); print join " ", sort
qw(côte côté cote coté)'
cote coté côte côté
Thanks,
--
Eric Cholet