perl-unicode

Re: How to use Unicode::Collate in multilinguage apps?

2004-03-28 01:30:04

On Thu, 25 Mar 2004 22:29:08 +0000
Rich <scriptyrich(_at_)yahoo(_dot_)co(_dot_)uk> wrote:

Hello

How should collation be handled in multitasking, multilingual applications -
in particular forking servers such as apache/mod_perl based web apps?

I can assume the following:

1) I'll know the preferred language via a RFC2616 language tag.
2) All data will be utf8 encoded Unicode.
3) The required language may differ for each request.

I guess Unicode::Collate is the way to go, so can I simply have one
Unicode::Collate instance per process using the default allkeys.txt table
file? 

Will that give sensible results for most (all?) languages, or do I need to
customise the collator on the fly when more 'exotic' (for want of a better
word) languages are requested? Are there other reasons, such as size and/or
performance issues, why the default allkeys.txt file may not be the way to
go?

I think, for a script representing usually one language,
allkeys.txt defines fairly acceptable collation order.
For example, order of hiragana and katakana is approximately
compliant with the costom of the Japanese language.

In contrast, for a script representing many languages
(say, the Latin script), tailoring may be often necessary.

E.g. 'Ä' is sorted as A-umlaut (sometimes as 'AE') in German,
and as one of additional letters ordered after 'Z' in some
northern-european languages.
But according to Unicode default collation, 'Ä' is ordered
as a modified 'A' and equal to 'A' at the primary level.

I must stress that I'm ok with most aspects of i18n/l10n - it's specifically
the correct use of Unicode::Collate in multitasking apps that I'm
interested in.

Suggestions would be welcome - even more so if they don't involve having to
know the TR10 docs inside out!

I write Unicode::Collate::Locale (tentatively) for linguistic tailoring
of UCA. To use it, Unicode::Collate should search allkeys.txt 
from any directories in @iNC (at present it searchs table files
only under the directory where it locates.)
So Unicode::Collate::Locale should require Unicode::Collate 0.40 or later,
which is not released yet, but a prerelease is available as shown below.

[tarball]
http://homepage1.nifty.com/nomenclator/perl/Unicode-Collate-Locale-0.01.tar.gz
[doc]
http://homepage1.nifty.com/nomenclator/perl/Unicode-Collate-Locale.html
   Sorry, now tailoring of only few languages are implemented.
   It may be enhanced sooner or later...

[prerelease] This will be released *after* Perl 5.8.4 (or its RC) will be out.
http://homepage1.nifty.com/nomenclator/perl/Unicode-Collate-0.40.tar.gz

regards,
SADAHIRO Tomoyuki