perl-unicode

RE: Source data for perl encodings

2001-01-08 17:02:53
Mark Leisher <mleisher(_at_)crl(_dot_)nmsu(_dot_)edu> writes:
   Ed> I haven't seen your engine, but I've created such engines and worked
   Ed> on engines others have created. They aren't easy to do. Just
   Ed> supporting a basic set of Internet encodings will be a difficult
   Ed> undertaking. Maybe you've got it all worked out, in which case hats
   Ed> off to you. Otherwise I would recommend a strategy whereby you
   Ed> implement a core set of single-byte conversions (for, say, Western
   Ed> Europe, Central Europe, and Cyrillic languages) with an internal
   Ed> engine and plan to incorporate an optional ICU hook-up for anything
   Ed> else. That way you don't have to maintain and distribute large Asian
   Ed> encoding tables.

What would be nice is some variation of Bruno Haible's libiconv that allows
dynamic loading of mapping tables.  Then Perl would have reasonable conversion
capability at about 1/16 the size of ICU.

What we have in bleadperl right now is ext/Encode/compile which is
a perl script which can read either Tcl's .enc files ('cos
that is where we started) or ICU's .ucm files. It writes .c files
containing data tables for my "trie state machine".

Dynamically loading the .c files is just an exercise in MakeMaker/DynaLoader/XS
which I can (almost) do in my sleep (well it gives my bad dreams anyway).

I am quite happy to continue down that road - if "we" are reasonably confident
that I am not just digging a hole for us to fall into. 

-- 
Nick Ing-Simmons