Mark Leisher <mleisher(_at_)crl(_dot_)nmsu(_dot_)edu> writes:
Peter> Also: since the .enc files seem to have adopted the four hex digit
Peter> per code point format how is the Encode module going to handle
Peter> UTF16 surrogates?
I haven't looked into the format for .enc files, but another thing that
happens for example, is more that a single source character set codepoint can
map to multiple Unicode codepoints. An example is the last version of the
Armenian national standard which includes single codepoints for three very
common ligatures, each of which should be converted to two Unicode codepoints.
The opposite can happen as well.
Although complicated on the surface, I highly recommend using Tech Report #22
on the Unicode website as a guideline for designing future mapping tables.
All excellent stuff. What we have today is a "trial" API and a prototype
implementation based on what Tcl uses. We needed _something_ and all we
had was fine words and no actual code.
(Well various Unicode::Map* modules but those all seem to predate, and
coexist badly with, native support for chars > 255 - but I may just
be misunderstanding things.)
I would be delighted if people
start fixing or improving the prototype - but we really want to prove
that the API is "suitable" for actual use (by XS modules like Tk,
PerlIO, EBCDIC, ...).
What I need for Tk and what PerlIO will need, is a fast C callable
API to get between various external encodings used by fonts or in files,
and perl's internal form.
--
Nick Ing-Simmons <nik(_at_)tiuk(_dot_)ti(_dot_)com>
Via, but not speaking for: Texas Instruments Ltd.