Paul Vixie writes...
warning: i have been spoiling for a fight all day :-).
Thanks for the warning.
I think your position is [now] clear. Let me clarify one element of
mine...
The only thing that is wrong with doing it [vixie's] way is that we have
moved beyond simple character mail and into very complicated stuff in
which it may be very hard to guarantee "straightforward 1-to-1
mappings".
"very hard" is not relevant.
Interestingly, I had intended the important word there to be
"guarantee". I agree that "very hard to implement" is not an
appropriate design criterion if the result is sufficiently important.
What I am concerned about is the matter of "guarantees" or, if you will,
of third-party decisions that some process can be trusted.
Interestingly enough, from that perspective, one of your examples
illustrates the point. You say...
you can represent any data structure in a
7-bit-wide data stream. look at C-language source code for examples.
look at uuencode and atob for more examples.
And I would respond that there are many known cases of uuencoded files
moving across the extended mail internet and ending up in a form when
decoded that isn't the same as the way they started. This isn't
straightforward 1-to-1 mapping. Now, from the standpoint of "hard to
implement", the problem is a small matter of programming--either by the
use of coding tables that don't get trashed or by the even safer
solution that Nathaniel adopted, which is to use Base64 and a very
carefully chosen set of coding characters instead. But from the
standpoint of "guarantees", the fact that you cite uuencode as an
example suggests to me that the "hard to guarantee" assertion is
plausible, possibly even valid.
(such a design would include an application-level checksum
for all the reasons clark et al have outlined in their end-to-end paper.)
Aha. A wonderful idea. But here is where, from my "cut off the
problems at the source" perspective, we come full circle. Let's suppose
we go ahead and say "every relay should be able to convert". And, in
lieu of an external guarantee mechanism, we use application-level
checksums. Let's further assume (so as to not complicate the problem)
that we can design such a checksum so that it is not excessively
sensitive to the blank-and-tab trashing and adding problems that we run
into with mail moving across the extended internet: hard, but clearly
not hard enough that it should become a design criterion. Ok, the
message gets to the far end, and gets decoded, and the checksum says
"nope, got damaged". Now the internal guarantee method (checksum) has
informed you that you have received damaged mail. What are you going to
do with it? Bounce it, perhaps? Deliver it anyway with a note that it
may be trashed beyond recognition? Deliver it *and* bounce it in the
tradition of at least one popular MTA out there, creating some
fascinating loop-potential?
i would be willing to punt close-readability, though we could optimize
for the trivial (and common) case of 8-bit single-part text that just
needs the 8th bit for accents and other non-ascii symbols.
Aha. But now we go around another circle. If we restrict conversions
to "8-bit single part text", I think I know how to write rules, I think
there are ways to certify decent behavior, and I generally have a lot
fewer problems. But then we have "convert text that happens to arrive
single-part, bounce multipart and/or multimedia". Bet that would make
some people unhappy.
But that isn't what you said, of course. You went on to say...
anything
else could just be bitblasted into atob or uuencode or whatever structure-
dependent format people are currently considering.
But to "bitblast" into one of these formats is exactly what people
have been shooting down for the last several days. It implies, at least
to me, that one is willing to say "ok, this is lots more complicated
than single-part text, encapsulate the whole message". But "encapsulate
the whole message" implies content-encoding at the top level, which, we
have been told, hopelessly complicates UAs.
The alternative isn't "bitblasting into ..." (presumably Base64). It
is parsing the message, finding each content part separator, making
individual decisions about the correct way to encode each body part, and
then, potentially, using a different transport encoding for each one.
*That* isn't easy (not a design criterion). It also isn't easy to
guarantee that someone will do it right, or to determine in a robust and
survivable (i.e., you don't end up bouncing things, which is what you
are trying to avoid) way that it has been done right. And the second
is, IMHO, a reasonable design criterion.
either we design it correctly or we don't. either the design is implemented
correctly or it isn't. i don't understand "hard" as a design-criteria here.
The reference is to Murphy's law of bug-free very complex systems.
Bug-free simple systems are lots more probable in practice. And there
is a missing entry on your list between "design correctly" and
"implement correctly" and that is "specify correctly, and well and
unambiguously so that everyone understands it the same way". Another
variation on Murphy's law says that more complex models have more
failure points in that area too.
And, as my evening cheap shot, since you were looking for a fight and
cited C source code and its transportability, it is possibly worth
pointing out that this putatively simple and straightforward,
easy-to-define language has just set a record in the national and
international standards community. In the relatively short time since
the C Standard was completed, there have been more identified
ambiguities, more formal requests for interpretation, than for any other
Standard programming language in history during the entire lifespans of
those languages. More than Algol or Pascal, more than FORTRAN or BASIC,
more than COBOL or APT, more even than Ada or PL/I. And I think that
is an observation about the "hardness of guaranteeing...".
--john