ietf-822
[Top] [All Lists]

Re: printable wide character (was "multibyte") encodings

1993-01-11 13:30:13
Henry,
   There are a couple of problems with UTF-2, at least as I understand
it (I still haven't seen the Standard and, given some of the history, am
making only tentative statements until ITTF signs off and it goes to
the printer.  As far as I know, that hasn't happened).

(1) It is still a variable-length encoding.  Variable-length encodings
are good/easy for some operating systems/programming language/ character
models (including, given some other constraints, the UNIX/C null-terminated
string model), and much more difficult for others.

(2) It is ASCII-optimized.  To the degree to which a character [sub]set
(i.e., a 10646 "row") is close to ASCII, it gets a minimum number of
octets per character (one for ASCII itself).  Character [sub]sets that
are quite different, e.g., Asian ideographic character sets, are fairly
severely penalized, ending up not in two octets but in three, four, or
more.  It might be reasonable at this stage in the Internet's evolution
to accept that penalty.  But I can't defend that position-- not only are
there the oft-repeated issues of US-bias, Euro-bias, or
Roman-character-bias, but, as the network becomes heavily used in Asia,
it seems to doom us to a second transition, presumably to unencoded
10646.  

Also...
They are properly thought of as peers, and it is reasonable and proper
to consider something like "10646-UTF-2" to be a character set, one
...
  Again, I haven't seen the final formal document, but had understood
that UTF-2 was in the "appendix not part of the standard" category.  If
that is true, this statement is interesting, possibly valid in a better
universe, but false.   With all of the changes SC2 made between DIS-1
and DIS-2, if it had wanted these as peer/alternate encodings, they
would have said so.  As I understand it, at least one reason why they
didn't was to avoid the accusation (probably true anyway) that one can't
use 10646 without a profile.

Perhaps like Steve, I don't believe that there is any path of no
resistance, or even a path of very low resistance.  The switchover is
going to cause a certain amount of pain, and I don't want to suffer
twice if it can be helped.

  --john