perl-unicode

Re: Encode's .enc files and a question

2000-10-25 08:12:57
Mark Leisher <mleisher(_at_)crl(_dot_)nmsu(_dot_)edu> writes:
   Peter> Also: since the .enc files seem to have adopted the four hex digit
   Peter> per code point format how is the Encode module going to handle
   Peter> UTF16 surrogates?

I haven't looked into the format for .enc files, but another thing that
happens for example, is more that a single source character set codepoint can
map to multiple Unicode codepoints.  An example is the last version of the
Armenian national standard which includes single codepoints for three very
common ligatures, each of which should be converted to two Unicode codepoints.
The opposite can happen as well.

Although complicated on the surface, I highly recommend using Tech Report #22
on the Unicode website as a guideline for designing future mapping tables.

All excellent stuff. What we have today is a "trial" API and a prototype
implementation based on what Tcl uses. We needed _something_ and all we 
had was fine words and no actual code. 
(Well various Unicode::Map* modules but those all seem to predate, and 
coexist badly with, native support for chars > 255 - but I may just 
be misunderstanding things.)

I would be delighted if people
start fixing or improving the prototype - but we really want to prove 
that the API is "suitable" for actual use (by XS modules like Tk, 
PerlIO, EBCDIC, ...).

What I need for Tk and what PerlIO will need, is a fast C callable 
API to get between various external encodings used by fonts or in files,
and perl's internal form. 

-- 
Nick Ing-Simmons <nik(_at_)tiuk(_dot_)ti(_dot_)com>
Via, but not speaking for: Texas Instruments Ltd.