Re: printable wide character (was "multibyte") encodings

As for UTF-2...I suggest that this WG define two 10646/Unicode charsets:

1) "flat": canonical form is to transmit each n-bit character as n/8
octets, in order from most significant octet first to least significant
octet last.

2) "UTF-2":  canonical form is a UTF-2 stream.


Well, one is always better than two, of course, but in general this
strikes me as plausible.

However, I think charset (1) may need a bit more thought.  Is n variable
from character to character?  How do you know the value of n?  My impression
is that the "flat" 10646 codes are not self-describing in any way, so you
need external information to know how many octets constitute a character.
There are at least two values of n -- 16 and 32 -- which are likely to be
either popular or politically required:  16 because it will be what almost
everyone will use, 32 because technically 10646 is a 32-bit standard and
not everyone is happy with the first 16-bit plane's contents (aka Unicode).

                                         Henry Spencer at U of Toronto Zoology
                                          
henry(_at_)zoo(_dot_)toronto(_dot_)edu   utzoo!henry

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re: Unicode is not an IETF character code, John C Klensin

Next by Date:

Re: restrictions when defining charsets, henry

Previous by Thread:

[no subject], andrew

Next by Thread:

Re: printable wide character (was "multibyte") encodings, henry

Indexes:

[Date] [Thread] [Top] [All Lists]