perl-unicode

Re: Encode's .enc files and a question

2000-10-25 07:42:37

    Peter> Also: since the .enc files seem to have adopted the four hex digit
    Peter> per code point format how is the Encode module going to handle
    Peter> UTF16 surrogates?

I haven't looked into the format for .enc files, but another thing that
happens for example, is more that a single source character set codepoint can
map to multiple Unicode codepoints.  An example is the last version of the
Armenian national standard which includes single codepoints for three very
common ligatures, each of which should be converted to two Unicode codepoints.
The opposite can happen as well.

Although complicated on the surface, I highly recommend using Tech Report #22
on the Unicode website as a guideline for designing future mapping tables.
-----------------------------------------------------------------------------
Mark Leisher
Computing Research Lab            Cinema, radio, television, magazines are a
New Mexico State University       school of inattention: people look without
Box 30001, Dept. 3CRL             seeing, listen without hearing.
Las Cruces, NM  88003                            -- Robert Bresson