perl-unicode

Re: Caseless and accentless string comparisons

2003-05-14 07:30:06

On Tue, 13 May 2003 22:04:01 -0400
Ben Bennett <fiji(_at_)ayup(_dot_)limey(_dot_)net> wrote:

On Tue, May 13, 2003 at 11:32:57PM +0900, SADAHIRO Tomoyuki wrote:

I'm sorry, reasons why allkeys.txt is not included in the package are:

(1) its huge file size.

Well it is less than 1 mb...  But if size is a big problem then I
could provide it as a separate "module".  Basically it would be
Unicode::Collate::allkeys and allkeys.pm would be as simple as
possible to tell the perl tools that the module was installed...

I think the Makefile.pm should offer to get the latest version from
the unicode web site though to make upgrading the file easy.  Although
since it appears not to change much that is probably overkill and I
can just keep the pm in sync with their file.

If this seems reasonable I can implement it if you want.

Parhaps it should be useful if CPAN has something
that attempts to fetch allkeys.txt from the Unicode website,
and install it with Unicode::Collate.
Its user may be supposed to want to upgrade both.

I'm quite ignorant of programming via a network, though.

(2) overwrite of updated allkeys.txt may break
    someone's tailoring for old allkeys.txt.

Ah.. valid point.  Perhaps a 2 step approach would make sense?  You
could have allkeys.txt and optionally customkeys.txt.  Then the
ordering would be:
  - User specified always wins
  - If present use customkeys.txt for site tailoring
  - Fall back to allkeys.txt

That way we can safely upgrade allkeys.txt.  If this seems reasonable
let me know and I can provide a patch.

As implemented already, each collator object is allowed
to use a different table file.   If necessary,
a user can rename the favorate table file as he/she like.

So, IMHO, things may be good only if a user can determine
whether allkeys.txt will be upgraded or not.

Thank you,
SADAHIRO Tomoyuki