Re: printable multibyte encodings

Perhaps I'm being dense, but it seems to me that any 16-bit (or 32-bit,
or 128-bit, or whatever) characters can be (and typically are)
represented as 8-bit octets in a canonical order...


Just a straight 16-bit representation of a 16-bit character has two problems.
First, if most of the characters are in fact ASCII -- often the case -- then
it is twice as big as it needs to be.  Second, it often includes octets that
cause trouble for software, e.g. ASCII NULs (not an issue if it's inside
a MIME encoding, but significant in other contexts).  Straight-16-bit would
often be the preferred representation inside programs, but for storage and
transmission (and use in filenames etc.), an encoding which avoids these
problems is desirable.

UTF-2, in particular, is an encoding of 16-bit characters that represents
ASCII characters as themselves (one octet apiece) and is "file-system
safe", avoiding octets that have special meaning to common software.

                                         Henry Spencer at U of Toronto Zoology
                                          
henry(_at_)zoo(_dot_)toronto(_dot_)edu   utzoo!henry

Previous by Date:	Re: printable multibyte encodings, Keith Moore
Next by Date:	Re: printable multibyte encodings, Steve Summit
Previous by Thread:	Re: printable multibyte encodings, Keith Moore
Next by Thread:	Re: printable multibyte encodings, Beast (Donald E. Eastlake, 3rd)
Indexes:	[Date] [Thread] [Top] [All Lists]