Re: TEXT version of Draft RFC


Nathaniel, Ned, 

After reading the last revision of the draft document, I have noticed
several missing points.

As I understood the original proposal, a user agent to be conformant
to the protocol needed to implement #all# the encoding types to allow
interoperation. This was to allow a sender to choose the most
efficient encoding without regard to the abilities of the receiver.
This text and related discussion does not appear in the document.
This is the reason (I believe) that there are proposals for many new
encoding types!  

Related to the first point, there was discussion about having a
"mandatory" encoding type for text-like stuff so a minimal user agent,
designed to only do text processing did not need to have the
encoding-decoding complexity.  Consensus was not reached. To clip the
minutes:

 A strawman poll was taken with the following options.


  1. Body part ``a'' must be sent with encoding type ``y''
  2. Body part ``a'' should be sent with encoding type ``y'', but may be
     sent with any encoding x,y,z
  3. Body part ``a'' can be sent with any encoding x,y,z
  4. Body parts a, b, c can be sent in any encoding x,y,z except for
     body part ``d'' which must be sent in ``x''


 There was no majority, with most expressing preference for (2), and
 equal number expressing either (3) or (4).

This needs to be addressed in the document.

Third, none of the content-types are for 8 bit text.  Only ASCII is
specified as a defined content type.  I realize that this is not yet
settled on the mailing list, but it would be nice to have at least a
strawman available for other content-types as well as examples.
Possible examples include the 8859-n family, the 2022 family, 646
national variant family, and the 10646 set.

It is not necessary to pick a #standard# way to encoding character
sets, but it would be very useful to at least demonstrate the various
encodings.  


Now, I have one very strong feeling in terms of selecting a character
set.  One of the hallmarks of Internet protocols is that they are
implementable as written.  Profiling documents should not be required
for the one implementation to work with another.  In the case of this
document, this is no longer true.  I can implement this RFCXXXX, and
not be able to send mail to another person implementing this document,
even if we are using the same language, unless we have a prior
agreement about what character set we are using.  I can write French
in Latin 1, a 2022 variant, or in Unicode, and unless all
implementations have support for all possible character sets, I cannot
count on interoperating.

That said, I'd like to see a "common" character set defined in this
document.  At this point, that seems to point to either 10646, or
Unicode.  Both have their dis-advantages, but they are implementable.
Use of other codes are also acceptable. 

Another option is to specify character sets for specific domains of
use, ie. Romance languages use Latin-1.  I think this gets very messy
very fast.

Comments?