Mark Leisher <mleisher(_at_)crl(_dot_)nmsu(_dot_)edu> writes:
Nick> Following the first page will be all the other pages, each in the
Nick> same format as the first: one number identifying the page followed
Nick> by 256 double-byte Unicode characters. If a character in the
Nick> encoding maps to the Unicode character 0000, it means that the
Nick> character doesn't actually exist. If all characters on a page would
Nick> map to 0000, that page can be omitted.
There may some day be a use for the Unicode codepoint 0x0000. It might be
better to make this 0xFFFF, which is a guaranteed non-character in Unicode and
probably in ISO10646.
Documentation not withstanding, the original Tcl C code does permit the
Unicode code point 0x0000 to exist iff in the 0 slot of of the other encoding.
e.g. ASCII NUL is mapped to it.
I made at least an attempt at this in the OO perl stuff as well.
0x0000 has a nice C/perl "falseness" which 0xFFFF lacks - but as we don't
use the .enc tables directly anyway using 0xFFFF and converting to C<undef>
at load time (Tcl does not have undef equivalent) would be a reasonable
approach.
--
Nick Ing-Simmons