Although I probably am not viewed as a "highly experienced email developer,"
I would like to say a few things about NULLs and the use of Content-Transfer-
Encodings in MIME.
First on the NULL issue, the problem seems to be moot for email TRANSPORT and
will soon only exist in user agents. Zmailer, Smail, and PMDF are all
binary-transparent. Software.com's SMTP implementation is also
binary-transparent (uses counted strings). Also sendmail 8.7 will handle
NULL (in the message body only) and arbitrarily long lines.
If NULL was ever to be removed from the specs, it should have been done 10
years ago when RFC821 and RFC822 were still New Things.
I also have a few (probably unique :) thoughts on the 7bit/8bit/binary
issue which have been bothering me for a while now:
- While MIME tries to be a transport-independent spec,
it fails at this because of the close ties to SMTP
transport limitations. RFC 822 does not specify a
line length limit of 1000 characters for messages.
Thus specifying 7bit and 8bit as consisting of lines
no longer than 1000 characters in MIME is not backward-
compatible with RFC822.
- Having three "encodings" (Content-Transfer-Encoding)
that mean basically "no encoding was performed"
indicates a design flaw in MIME. (The information
they convey IS necessary, but belongs elsewhere.)
- The only values that should exist for C-T-E are "base64"
and "quoted-printable" since an encoding has actually been
performed (also x-token if an inverse operation needs to
be performed). If no encoding was done, or if no TRANSPORT
has occurred, requiring a C-T-E label for every MIME message
makes no sense. 7bit, 8bit, and binary should not be called
C-T-E's, they are simply an attribute of the "text" or other
object in that MIME section.
- The "text/" hierarchy already has an implicit means
of specifying whether the text is 7bit, 8bit, or
binary in nature -- that being the "charset" that goes
along with it. Since a UA may not recognize every
charset, a new parameter for text types could be
defined that indicates the structure of the text:
Content-Type: text/plain; charset=us-ascii;
structure=7bit
or
Content-Type: text/plain; charset=UNICODE;
structure=binary
- If 7bit, 8bit, and binary labels for structure are found
to be too ambiguous (I think so), then how about this:
Content-Type: text/plain; charset=iso-8859-1;
width=85; range=1-255;
The default range (octets used by charset) would be 0-255
so that as we move to a true international character set,
the range can be left off. The width simply states the
maximum width of a "line" in the message, where a line is
defined as "string of octets through a CR LF pair." Since
this number only makes sense for CRLF-terminated, it is
optional. If it is missing, the User Agent should use one
of base-64 or q-p encodings when using SMTP to transport
the message (unless it determines the text is SMTP-safe).
Well I just had to get that off my chest. I hope I don't get the same
treatment as Ohta-san for not agreeing with everyone :)
Michael D'Errico
Software.com