perl-unicode

Re: ICU in Perl plans?

2001-07-13 15:27:38

On Thu, 12 Jul 2001, Nick Ing-Simmons wrote:

Peter Prymmer <pvhp(_at_)forte(_dot_)com> writes:
Is someone currently working on incorporating ICU into Perl? 

I think Nick Ing-Simmons took some codepage data from ICU and
folded it into the Encode module for developers versions of perl,
but it would appear that he retained the Tcl format of the data
rather than the ICU format.

Not true. Rather the other way round as it happens.
ext/Encode/compile can read .ucm files now (which is ICU's format IIRC),
but until the license issue was settled I used the Tcl tables.
("compile" can also read/write both formats).

The hold up now that license is clear is just one of tuits to 
A. Study license and get credits and Copyrights into shape.
B. Copy .ucm files from ICU to ext/Encode/Encode/*.ucm
C. Test it.
D. Decide if we want to use ICU's C code rather than my encengine.c
   (I am biased but I think encengine.c's scheme of working directly 
   on UTF-8 form is more appropriate to perl's internals.)

Thank you for the clarification.  I note that in the perl(_at_)11359
distribution there is a difference in file size for what appear to be
equivalent *.enc and *.ucm files.  For example:

% ls -l ext/Encode/Encode/posix-bc*
-r--r--r--   1 pvhp     system      1102 Jul  9 07:10 
ext/Encode/Encode/posix-bc.enc
-r--r--r--   1 pvhp     system      9847 Jul  9 07:10 
ext/Encode/Encode/posix-bc.ucm

% ls -l ext/Encode/Encode/ascii.*
-r--r--r--   1 pvhp     system      1090 Jul  9 07:09 
ext/Encode/Encode/ascii.enc
-r--r--r--   1 pvhp     system      4554 Jul  9 07:09 
ext/Encode/Encode/ascii.ucm

Which is understandable given the format differences.  Is it the case that
we can get rid of one format in favor of the other?  Do you foresee
maintenance of support for both formats?  Weighed against file size and
licensing considerations (and whatever else might be relevant) which
format would you favor (in other words can we trim down the size of the
perl tar ball)?  Thank you.

Peter Prymmer


<Prev in Thread] Current Thread [Next in Thread>