perl-unicode

Re: Encode's .enc files and a question

2000-10-26 13:23:48
Mark Leisher <mleisher(_at_)crl(_dot_)nmsu(_dot_)edu> writes:
   Nick> Following the first page will be all the other pages, each in the
   Nick> same format as the first: one number identifying the page followed
   Nick> by 256 double-byte Unicode characters.  If a character in the
   Nick> encoding maps to the Unicode character 0000, it means that the
   Nick> character doesn't actually exist.  If all characters on a page would
   Nick> map to 0000, that page can be omitted.

There may some day be a use for the Unicode codepoint 0x0000.  It might be
better to make this 0xFFFF, which is a guaranteed non-character in Unicode and
probably in ISO10646.

Documentation not withstanding, the original Tcl C code does permit the 
Unicode code point 0x0000 to exist iff in the 0 slot of of the other encoding. 
e.g. ASCII NUL is mapped to it. 

I made at least an attempt at this in the OO perl stuff as well.

0x0000 has a nice C/perl "falseness" which 0xFFFF lacks - but as we don't
use the .enc tables directly anyway using 0xFFFF and converting to C<undef>
at load time (Tcl does not have undef equivalent) would be a reasonable 
approach.

-- 
Nick Ing-Simmons