Re: MIME & ISO-10646

1) UCS-2 is 16 bits and is the only part of the standard that has
any data in it. Mail that allows all 8 bit characters does not have
to do anything special except mark where the boundaries of the UCS-2
data are in the file. They are after all, just two 8 bit bytes.

2) Encode the data with UTF-1 which is mentioned in a non-normative
appendix. This prevents some of the characters that cause problems
from appearing as either of the two bytes.

3) Encode the data with UTF-2 or FSS-UTF, a different but similar
encoding that has been worked out by a joint committee of Uniforum and
X/Open.  This eliminates even more problem characters, in particular
the ones that typically cause grief if used in a Unix filename.


I would strongly desire a *single* encoding to be chosen for the
Internet (with charset value "iso-10646", what else?).

My personal preferences are for choice UTF-2 (assuming this is
the one designed by the Plan 9 folks), because it is the only one that
is fully upwards compatible with ASCII (the others will result in 7-bit
chars not representing themselves).

UTF-2 is used by the Plan 9 team, as reported in their forthcoming USENIX
paper (a previous version and manual pages for some supporting routines
are contained in the Plan 9 manual, available via anonymous FTP from
research.att.com).

Comments?

--
Luc Rooijakkers                                 Internet: 
lwj(_at_)cs(_dot_)kun(_dot_)nl
Faculty of Mathematics and Computer Science     UUCP: uunet!cs.kun.nl!lwj
University of Nijmegen, the Netherlands         tel. +3180652271