perl-unicode

Re: Encode's .enc files and a question

2000-10-26 07:05:16

    Philip> On Wed, 25 Oct 2000, Mark Leisher wrote:
    >> There may some day be a use for the Unicode codepoint 0x0000.  It might
    >> be better to make this 0xFFFF, which is a guaranteed non-character in
    >> Unicode and probably in ISO10646.

    Philip> Isn't that the natural character to use for null-terminated
    Philip> strings? For example, if I'm processing UTF-8 text in C, "foo" is
    Philip> equivalent to 0066 006F 006F 0000. In which case, it's very much
    Philip> in use already.

Yes, zero is a "natural" terminator for strings.  But the first character in
the source string that maps to zero will truncate the output string, leaving
you with a partial conversion and little idea if it was an algorithm problem
or a character mapping problem.

If the converted string contains 0xFFFF, it will be pretty clear the source
text had bogus characters the moment you display it.
-----------------------------------------------------------------------------
Mark Leisher
Computing Research Lab            Cinema, radio, television, magazines are a
New Mexico State University       school of inattention: people look without
Box 30001, Dept. 3CRL             seeing, listen without hearing.
Las Cruces, NM  88003                            -- Robert Bresson