ietf-822
[Top] [All Lists]

Re: printable wide character (was "multibyte") encodings

1993-02-04 17:38:03
As for UTF-2...I suggest that this WG define two 10646/Unicode charsets:

1) "flat": canonical form is to transmit each n-bit character as n/8
octets, in order from most significant octet first to least significant
octet last.

2) "UTF-2":  canonical form is a UTF-2 stream.

Well, one is always better than two, of course, but in general this
strikes me as plausible.

However, I think charset (1) may need a bit more thought.  Is n variable
from character to character?  How do you know the value of n?  My impression
is that the "flat" 10646 codes are not self-describing in any way, so you
need external information to know how many octets constitute a character.
There are at least two values of n -- 16 and 32 -- which are likely to be
either popular or politically required:  16 because it will be what almost
everyone will use, 32 because technically 10646 is a 32-bit standard and
not everyone is happy with the first 16-bit plane's contents (aka Unicode).

                                         Henry Spencer at U of Toronto Zoology
                                          
henry(_at_)zoo(_dot_)toronto(_dot_)edu   utzoo!henry

<Prev in Thread] Current Thread [Next in Thread>