perl-unicode

Re: Unicode::Collate question

2003-12-01 09:30:05
Thank you both for your replies. What about sorting words in one particular language, is Perl's sort() good enough? I'm wondering, since language isn't
one of sort()'s arguments.

First we need to define "good enough"... again, if you are sorting
"simple" English or Hawaiian, you are probably fine.  But as soon
as your "words" contain real-life complications like

        - letters like é or or ö or æ or ...
- beyond-Latin-1-letters like Ă or Ł or Б or א or अ or ぁ or ... - peoples' names
        - acronyms and the like
        - do all the characters matter or just the letters
        - sorting mixed letters and digits
        - Roman numbers

you are on your own. For the first item the use of the locale pragma can help as long as your data is 8-bit and in one locale. As soon as data becomes Unicode,
Perl will as far as I know ignore localeness for sorting.

If you find yourself wanting some complex sorting, look into CPAN, what you can find from search.cpan.org with "sort", for example Sort::ArbBiLex might
be useful.

--
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen


<Prev in Thread] Current Thread [Next in Thread>