[from way back in January]
...
2. Take the plunge and embrace wide characters with open
arms: define a Content-transfer-encoding which encodes 16
(or 32) -bit characters, and model the communication path
between the content-transfer-encoding decoder and the
richtext parser as a stream of 16- or 32-bit characters.
(Whether this stream is implemented as an octet stream in
some canonical order, or as some word-oriented IPC
mechanism, is an implementation detail.) The point is
that the richtext parser's front-end "get a character"
primitive would get a wide, multioctet character. (The
special '<' character would therefore appear as a 16- or
32-bit quantity with value 60).
Yes!
Keith Moore last month bemoaned the suggestion of a
departure from the familiar and comfortable byte stream.
If we're going to use characters larger than 8 bits, some
departure somewhere from an octet stream is obviously
(and by definition) necessary. Recalling the proper
definition of "byte", however, we can if we wish continue
to think about byte streams, as long as we remember that
a byte may have more than 8 bits. ...
Yes again.
Compilers were once thought to be nearly impossible to write,
until (among other things) we learned to separate lexical
analysis from parsing, which turned out to make the task much
cleaner and more tractable. In an analogous way, I'd like to
keep transfer encoding issues clearly separated from character
set issues, ...
I'd like to second this. (perhaps a bit late)
...
Steve Summit
scs(_at_)adam(_dot_)mit(_dot_)edu
--
Rick Troth <troth(_at_)rice(_dot_)edu>, Rice University, Information Systems