I am impressed with the robustness of the encoding schemes in MIME and
gratified to see the emphasis on compatibility.
Thanks for the kind words.
However, I see one weak
spot that, I think, could be filled quite easily. I imagine that quite
a lot of MIME-compliant traffic will go as quoted-printable or 7-bit,
rather than as base64, because of the ability of a non-MIME receiver to
read "printable" text and because straight ASCII text can be transmitted
faster without encoding. For that reason, among others, I am interested
in increasing the robustness of quoted-printable and 7-bit text. In
particular, I see a need for one small augmentation to the protocol to
cover possible (and, in fact, likely) gateway translation problems. The
note on page 15 of the (Postscript) draft points out that 14 specific
characters are prone to garbling at certain gateways and observes that
encoding those characters would improve the reliability of transfer.
However, the readability (not to mention the speed of transmission)
would suffer if lots of characters were encoded, and I think most people
would simply not do that. It seems to me that the protocol can easily
correct any garbling by including one extra header line that looks
something like this:
Content-key: !"#$(_at_)[\]^`{|}~
The sender inserts that line with the 14 "endangered" characters in
that order, and the receiver can not only tell whether the message
has been garbled, but also correct it (by translating all instances
of any unexpected characters that arrive in that line into the expected
ones).
This an interesting idea; I don't recall anyone else suggesting it before.
I agree that such a mechanism might help detect errors in the translation
of quoted-printable. However, I don't see much hope for using it as a
scheme for error correction.
In my experience, when gateways decide to mess up character translation, they
usually do it by mapping a group of characters into a single character. One
favorite of mine is the group of gateways that map {|}~ into spaces. And
when this happens, as it sometimes does, there is no way to correct it.
The correction aspect of this scheme can only work when you're sure that
none of the garbled characters have been mapped into something that clashes
with existing content. And you can never be sure of this on the receiving
end. For this reason I don't think the correction aspect of this scheme is
very strong. But it definitely might help catch errors. On the other hand,
other schemes, notably various sorts of message integrity checks, handle
this case pretty well. The only reason that there's no MIC in MIME is that
we could not reach closure on what MIC we wanted.
The latter step, of course, can be skipped unless garbling has
actually occurred. One might argue that these 14 characters should be
extended to 21 to include all characters not in the X.400 Printable
String list, i.e., adding %&*;<>_ either tacked onto the end or
interleaved. I've never run into a situation where any of those seven
characters was messed up, so I can't tell if it would pay to include
them. The 14 characters listed include all 12 characters alloted for
national variants on ISO 646, and scrutiny of the IBM corporate
standard character sets shows that those 14 characters are the only
ones (of the 94 found in US-ASCII) that vary among the EBCDIC sets.
Of the less problematic characters you list the _ is the only I've ever
gotten into trouble with.
Anyhow, that's a relatively minor detail, as is the decision of what to
do if the header line comes through with too many or too few characters
(or, perish the thought, duplicates) -- presumably, the MUA would have
the option of sending an automatic error notice to the sender or simply
informing the recipient that the message may be (but isn't necessarily)
garbled. It's a fact of life that files can be messed up, whether by
unfortunate translation at a gateway or by truncation or outright
substitution, and I think every reasonable effort should be made to
detect such problems where prevention cannot be ensured. Note that a
non-MIME mail receiver on the wrong side of a translating gateway could
benefit from this extension and be able to correct the garbled
characters "by eye".
Frankly, it is the "by eye" aspect of this scheme that I like. Unlike
checksums it might actually be useful in an unextended mail reader
environment.
I'm sure there are a number of other details that I haven't thought of,
but the idea is basically very simple and easy to implement, and I
think it would help to make MIME much more "bulletproof."
I think this scheme needs to be considered along with other checksumming
approaches when the time comes to put that in. Right now we're all too
exhausted to do this work, I think! But keep in touch; this group will
eventually have to deal with the shelved MIC issues.
Ned