perl-unicode

Re: .enc docs comments [was Re: Encode's .enc files and a question]

2000-10-26 12:51:48

    Peter> Uncomfortable to say the least.  Could a surrogate scalar encoding
    Peter> be done as an escaped encoding where the high and low pairs are put
    Peter> into the .enc files as HHHHLLLL where both H and L =~ /[0-9A-F]/?
    Peter> hence necessitating a shift to reading 8 characters (possibly
    Peter> implemented using the "E" mechanism?).

Yes.  If you use surrogate pairs, the pair would represent a UTF-16 encoding.
If you combine them according to the Unicode surrogate formula, they would
then become a scalar that would represent a UTF-32 encoding.

    Peter> How firmly established is the Tcl scheme?  Is it still being
    Peter> hammered out?  I do think that it would be nice to avoid yet
    Peter> another gratuitous file format incompatability if possible.  So how
    Peter> do the Tcl folks plan to handle surrogates or truly unrecognized
    Peter> characters?

I don't know.  I last used Tcl/Tk in the days of tcl7.?/tk4.? and haven't had
time to play with anything newer.  I do prefer Perl :-)
-----------------------------------------------------------------------------
Mark Leisher
Computing Research Lab            Cinema, radio, television, magazines are a
New Mexico State University       school of inattention: people look without
Box 30001, Dept. 3CRL             seeing, listen without hearing.
Las Cruces, NM  88003                            -- Robert Bresson