Re: printable wide character (was "multibyte") encodings

To:  Steve Summit <scs(_at_)adam(_dot_)mit(_dot_)edu>
Subject:  Re: printable wide character (was "multibyte") encodings
Date:  Thu, 13 May 1993 13:18:03 -0500 (CDT)

     2.     Take the plunge and embrace wide characters with open
    arms: define a Content-transfer-encoding which encodes 16
    (or 32) -bit characters...

[...]

      Yes!

No!

    Keith Moore last month bemoaned the suggestion of a 
    departure from the familiar and comfortable byte stream. 
    If we're going to use characters larger than 8 bits, some 
    departure somewhere from an octet stream is obviously 
    (and by definition) necessary.


The mapping from other "character sizes" to an octet-stream can be
defined on a per-content-type basis.  There's no need to define
additional content-transfer-encodings for this purpose.

Think of it this way.  You want to define a new data type that needs
to express things in 32 bit quantities.  You can either:

a) Define a new content-transfer-encoding that encodes not octets,
   but 32-bit words.  Define your content-type in terms of that encoding.
   While you're at it, define how a MIME mail reader is going to talk
   to the program that implements your new content-type; the mechanism
   is sure to be at least different than the mechanism now in use.
   If necessary, define extensions to the .mailrc and .mnh_file formats to
   specify a 32-bit-wide pipe rather than an 8-bit-wide pipe.  Get
   those who produce MIME user agents to upgrade their products to support
   the new content-transfer-encoding.

-or-

b) Define the canonical form of your new content-type in terms of an
   octet-stream.  Then the mapping of 32 bit words to octets is defined
   by your content-type.

Compilers were once thought to be nearly impossible to write,
until (among other things) we learned to separate lexical
analysis from parsing, which turned out to make the task much
cleaner and more tractable.  In an analogous way, I'd like to
keep transfer encoding issues clearly separated from character
set issues,   ...


Agreed.  That's why all MIME objects should have a canonical form expressed
in terms of a single, simple data structure.  That way you can "plug in"
whatever encoder works without having to worry about whether it works with
your chosen byte size.  (Well, you do have to worry about transparency
issues, but that's bad enough without having to deal with byte size issues
also.)

The chosen data structure for canonical form of MIME objects is an
octet-stream.  It could have been something else -- like an ASN.1-defined
stream encoded in BER.  But for better or worse, we didn't take that 
approach.

As a result, there's less complexity to implementing MIME.

Keith Moore