[excuse me, I sent cc to perl-unicode(_at_)perl(_dot_)org;
I expect some helps and/or suggestions may be given there]
Greetings,
I hope you won't mind a few questions related to your module
Unicode::Collate.
I want to correctly sort words in a variety of languages, currently
French, English, Spanish, Portuguese, German and Arabic. I am using
Perl 5.8.1 and unicode. I think I need Unicode::Collate to have
*correct* sorting. Is this correct?
Sorry, I think 'no', by default.
"DUCET", that is a default collation table provided by
unicode.org, do sort among many scripts in Unicode,
but does not do any language-specific collation.
Assuming it is, how can I find the correct settings for each of
the languages I'm interested in? I've read U::Collate's doc carefully,
but it is fairly complex, I'm not sure I could get it right given that
I'm neither a Unicode specialist, nor am I fluent in all the languages
I need to implement. What's the way, are there any wrappers available
or standard set of parameters per language?
If proper collation tables in the "UCA" format (which is a file format
for collation specified by Unicode technical standard #10) are provided,
that may be achieved;
though such a collation table file in UCA format should not be included
in the Unicode::Collate package, since its size should keep small
as possible.
For other formats except UCA, some sources about collation are available.
Here is a list as far as I know.
http://oss.software.ibm.com/cvs/icu/locale/
http://std.dkuug.dk/i18n/locales/
I once attempted to analyze data in std.dkuug.dk,
but I did not have a way how to "unicodify" them.
http://std.dkuug.dk/i18n/locales/mnemonic.ds seems to include
non-Unicode characters. As I don't know their meaning and usage,
my attempt made no advance.
Thanks in advance for any insight or pointers you can contribute.
Regards,
--
Eric Cholet
Regards,
SADAHIRO Tomoyuki