1) UCS-2 is 16 bits and is the only part of the standard that has
any data in it. Mail that allows all 8 bit characters does not have
to do anything special except mark where the boundaries of the UCS-2
data are in the file. They are after all, just two 8 bit bytes.
2) Encode the data with UTF-1 which is mentioned in a non-normative
appendix. This prevents some of the characters that cause problems
from appearing as either of the two bytes.
3) Encode the data with UTF-2 or FSS-UTF, a different but similar
encoding that has been worked out by a joint committee of Uniforum and
X/Open. This eliminates even more problem characters, in particular
the ones that typically cause grief if used in a Unix filename.
I would strongly desire a *single* encoding to be chosen for the
Internet (with charset value "iso-10646", what else?).
My personal preferences are for choice UTF-2 (assuming this is
the one designed by the Plan 9 folks), because it is the only one that
is fully upwards compatible with ASCII (the others will result in 7-bit
chars not representing themselves).
UTF-2 is used by the Plan 9 team, as reported in their forthcoming USENIX
paper (a previous version and manual pages for some supporting routines
are contained in the Plan 9 manual, available via anonymous FTP from
research.att.com).
Comments?
--
Luc Rooijakkers Internet:
lwj(_at_)cs(_dot_)kun(_dot_)nl
Faculty of Mathematics and Computer Science UUCP: uunet!cs.kun.nl!lwj
University of Nijmegen, the Netherlands tel. +3180652271