As for UTF-2...I suggest that this WG define two 10646/Unicode charsets:
1) "flat": canonical form is to transmit each n-bit character as n/8
octets, in order from most significant octet first to least significant
octet last.
2) "UTF-2": canonical form is a UTF-2 stream.
Well, one is always better than two, of course, but in general this
strikes me as plausible.
However, I think charset (1) may need a bit more thought. Is n variable
from character to character? How do you know the value of n? My impression
is that the "flat" 10646 codes are not self-describing in any way, so you
need external information to know how many octets constitute a character.
There are at least two values of n -- 16 and 32 -- which are likely to be
either popular or politically required: 16 because it will be what almost
everyone will use, 32 because technically 10646 is a 32-bit standard and
not everyone is happy with the first 16-bit plane's contents (aka Unicode).
Henry Spencer at U of Toronto Zoology
henry(_at_)zoo(_dot_)toronto(_dot_)edu utzoo!henry