Peter> Also: since the .enc files seem to have adopted the four hex digit
Peter> per code point format how is the Encode module going to handle
Peter> UTF16 surrogates?
I haven't looked into the format for .enc files, but another thing that
happens for example, is more that a single source character set codepoint can
map to multiple Unicode codepoints. An example is the last version of the
Armenian national standard which includes single codepoints for three very
common ligatures, each of which should be converted to two Unicode codepoints.
The opposite can happen as well.
Although complicated on the surface, I highly recommend using Tech Report #22
on the Unicode website as a guideline for designing future mapping tables.
-----------------------------------------------------------------------------
Mark Leisher
Computing Research Lab Cinema, radio, television, magazines are a
New Mexico State University school of inattention: people look without
Box 30001, Dept. 3CRL seeing, listen without hearing.
Las Cruces, NM 88003 -- Robert Bresson