perl-unicode

Re: .enc docs comments [was Re: Encode's .enc files and a question]

2000-10-27 07:15:42

    Philip> On Thu, 26 Oct 2000, Mark Leisher wrote:
    >> Following the first page will be all the other pages, each in the same
    >> format as the first: one number identifying the page followed by 256
    >> double-byte Unicode (UCS-2) characters.  If a character in the encoding
    >> maps to the Unicode character 0000, it means that the character doesn't
    >> actually exist.  If all characters on a page would map to 0000, that
    >> page can be omitted.

    Philip> This would mean that there is no good Unicode character to map
    Philip> ASCII 0x00 to. The obvious character is U+0000 "<control> = NULL",
    Philip> but that's reserved here. So if I'm translating a string
    Philip> containing NULs, those characters will be treated as
    Philip> "not-a-character"?

There is text in font encodings that have a glyph at position 0 which maps to
some non-zero Unicode value.  But yes, using 0x0000 to mean not-a-character
means that no coded character set can have a legitimate mapping to 0x0000.

Basically it just restricts the output Unicode strings from containing
non-characters and by null-terminating at the first unknown character.  When
this is the first character in the string, you become very puzzled that
nothing seems to be happening.
-----------------------------------------------------------------------------
Mark Leisher
Computing Research Lab            Cinema, radio, television, magazines are a
New Mexico State University       school of inattention: people look without
Box 30001, Dept. 3CRL             seeing, listen without hearing.
Las Cruces, NM  88003                            -- Robert Bresson