Mark Leisher <mleisher(_at_)crl(_dot_)nmsu(_dot_)edu> writes:
Ed> I haven't seen your engine, but I've created such engines and worked
Ed> on engines others have created. They aren't easy to do. Just
Ed> supporting a basic set of Internet encodings will be a difficult
Ed> undertaking. Maybe you've got it all worked out, in which case hats
Ed> off to you. Otherwise I would recommend a strategy whereby you
Ed> implement a core set of single-byte conversions (for, say, Western
Ed> Europe, Central Europe, and Cyrillic languages) with an internal
Ed> engine and plan to incorporate an optional ICU hook-up for anything
Ed> else. That way you don't have to maintain and distribute large Asian
Ed> encoding tables.
What would be nice is some variation of Bruno Haible's libiconv that allows
dynamic loading of mapping tables. Then Perl would have reasonable conversion
capability at about 1/16 the size of ICU.
What we have in bleadperl right now is ext/Encode/compile which is
a perl script which can read either Tcl's .enc files ('cos
that is where we started) or ICU's .ucm files. It writes .c files
containing data tables for my "trie state machine".
Dynamically loading the .c files is just an exercise in MakeMaker/DynaLoader/XS
which I can (almost) do in my sleep (well it gives my bad dreams anyway).
I am quite happy to continue down that road - if "we" are reasonably confident
that I am not just digging a hole for us to fall into.
--
Nick Ing-Simmons