[Top] [All Lists]

Re: printable wide character (was "multibyte") encodings

1993-05-13 11:27:32
        [from way back in January] 
     2.       Take the plunge and embrace wide characters with open
      arms: define a Content-transfer-encoding which encodes 16
      (or 32) -bit characters, and model the communication path
      between the content-transfer-encoding decoder and the
      richtext parser as a stream of 16- or 32-bit characters.
      (Whether this stream is implemented as an octet stream in
      some canonical order, or as some word-oriented IPC
      mechanism, is an implementation detail.)  The point is
      that the richtext parser's front-end "get a character"
      primitive would get a wide, multioctet character.  (The
      special '<' character would therefore appear as a 16- or
      32-bit quantity with value 60).
      Keith Moore last month bemoaned the suggestion of a 
      departure from the familiar and comfortable byte stream. 
      If we're going to use characters larger than 8 bits, some 
      departure somewhere from an octet stream is obviously 
      (and by definition) necessary.  Recalling the proper 
      definition of "byte", however, we can if we wish continue 
      to think about byte streams, as long as we remember that 
      a byte may have more than 8 bits.   ... 
        Yes again. 
Compilers were once thought to be nearly impossible to write,
until (among other things) we learned to separate lexical
analysis from parsing, which turned out to make the task much
cleaner and more tractable.  In an analogous way, I'd like to
keep transfer encoding issues clearly separated from character
set issues,   ... 
        I'd like to second this.   (perhaps a bit late) 
                                      Steve Summit 
Rick Troth <troth(_at_)rice(_dot_)edu>, Rice University, Information Systems