ietf-822
[Top] [All Lists]

Re: printable multibyte encodings

1992-12-16 13:17:54
Perhaps I'm being dense, but it seems to me that any 16-bit (or 32-bit,
or 128-bit, or whatever) characters can be (and typically are)
represented as 8-bit octets in a canonical order...

Just a straight 16-bit representation of a 16-bit character has two problems.
First, if most of the characters are in fact ASCII -- often the case -- then
it is twice as big as it needs to be.  Second, it often includes octets that
cause trouble for software, e.g. ASCII NULs (not an issue if it's inside
a MIME encoding, but significant in other contexts).  Straight-16-bit would
often be the preferred representation inside programs, but for storage and
transmission (and use in filenames etc.), an encoding which avoids these
problems is desirable.

UTF-2, in particular, is an encoding of 16-bit characters that represents
ASCII characters as themselves (one octet apiece) and is "file-system
safe", avoiding octets that have special meaning to common software.

                                         Henry Spencer at U of Toronto Zoology
                                          
henry(_at_)zoo(_dot_)toronto(_dot_)edu   utzoo!henry

<Prev in Thread] Current Thread [Next in Thread>