Perhaps I'm being dense, but it seems to me that any 16-bit (or 32-bit,
or 128-bit, or whatever) characters can be (and typically are)
represented as 8-bit octets in a canonical order...
Just a straight 16-bit representation of a 16-bit character has two problems.
First, if most of the characters are in fact ASCII -- often the case -- then
it is twice as big as it needs to be. Second, it often includes octets that
cause trouble for software, e.g. ASCII NULs (not an issue if it's inside
a MIME encoding, but significant in other contexts). Straight-16-bit would
often be the preferred representation inside programs, but for storage and
transmission (and use in filenames etc.), an encoding which avoids these
problems is desirable.
UTF-2, in particular, is an encoding of 16-bit characters that represents
ASCII characters as themselves (one octet apiece) and is "file-system
safe", avoiding octets that have special meaning to common software.
Henry Spencer at U of Toronto Zoology
henry(_at_)zoo(_dot_)toronto(_dot_)edu utzoo!henry