Re: 7/8-bit conversion vs. bouncing

Paul Vixie writes...

warning: i have been spoiling for a fight all day :-).

  Thanks for the warning.
  I think your position is [now] clear.  Let me clarify one element of 
mine...

 The only thing that is wrong with doing it [vixie's] way is that we have 
moved beyond simple character mail and into very complicated stuff in 
which it may be very hard to guarantee "straightforward 1-to-1 
mappings".

"very hard" is not relevant.

   Interestingly, I had intended the important word there to be 
"guarantee".  I agree that "very hard to implement" is not an 
appropriate design criterion if the result is sufficiently important.  
What I am concerned about is the matter of "guarantees" or, if you will, 
of third-party decisions that some process can be trusted.
  Interestingly enough, from that perspective, one of your examples 
illustrates the point.  You say...

you can represent any data structure in a
7-bit-wide data stream.  look at C-language source code for examples.
look at uuencode and atob for more examples.

  And I would respond that there are many known cases of uuencoded files 
moving across the extended mail internet and ending up in a form when 
decoded that isn't the same as the way they started.  This isn't 
straightforward 1-to-1 mapping.  Now, from the standpoint of "hard to 
implement", the problem is a small matter of programming--either by the 
use of coding tables that don't get trashed or by the even safer 
solution that Nathaniel adopted, which is to use Base64 and a very 
carefully chosen set of coding characters instead.  But from the 
standpoint of "guarantees", the fact that you cite uuencode as an 
example suggests to me that the "hard to guarantee" assertion is 
plausible, possibly even valid.

(such a design would include an application-level checksum
for all the reasons clark et al have outlined in their end-to-end paper.)

   Aha.  A wonderful idea.  But here is where, from my "cut off the 
problems at the source" perspective, we come full circle.  Let's suppose 
we go ahead and say "every relay should be able to convert".  And, in 
lieu of an external guarantee mechanism, we use application-level 
checksums.  Let's further assume (so as to not complicate the problem) 
that we can design such a checksum so that it is not excessively 
sensitive to the blank-and-tab trashing and adding problems that we run 
into with mail moving across the extended internet: hard, but clearly 
not hard enough that it should become a design criterion.   Ok, the 
message gets to the far end, and gets decoded, and the checksum says 
"nope, got damaged".  Now the internal guarantee method (checksum) has 
informed you that you have received damaged mail.  What are you going to 
do with it?  Bounce it, perhaps?  Deliver it anyway with a note that it 
may be trashed beyond recognition?  Deliver it *and* bounce it in the 
tradition of at least one popular MTA out there, creating some 
fascinating loop-potential?

i would be willing to punt close-readability, though we could optimize
for the trivial (and common) case of 8-bit single-part text that just 
needs the 8th bit for accents and other non-ascii symbols.

  Aha.  But now we go around another circle.  If we restrict conversions 
to "8-bit single part text", I think I know how to write rules, I think 
there are ways to certify decent behavior, and I generally have a lot 
fewer problems.  But then we have "convert text that happens to arrive 
single-part, bounce multipart and/or multimedia".  Bet that would make 
some people unhappy.
   But that isn't what you said, of course.  You went on to say...

 anything
else could just be bitblasted into atob or uuencode or whatever structure-
dependent format people are currently considering.

   But to "bitblast" into one of these formats is exactly what people 
have been shooting down for the last several days.  It implies, at least 
to me, that one is willing to say "ok, this is lots more complicated 
than single-part text, encapsulate the whole message".  But "encapsulate 
the whole message" implies content-encoding at the top level, which, we 
have been told, hopelessly complicates UAs.
   The alternative isn't "bitblasting into ..." (presumably Base64).  It 
is parsing the message, finding each content part separator, making 
individual decisions about the correct way to encode each body part, and 
then, potentially, using a different transport encoding for each one.
  *That* isn't easy (not a design criterion).  It also isn't easy to 
guarantee that someone will do it right, or to determine in a robust and 
survivable (i.e., you don't end up bouncing things, which is what you 
are trying to avoid) way that it has been done right.  And the second 
is, IMHO, a reasonable design criterion.

either we design it correctly or we don't.  either the design is implemented
correctly or it isn't.  i don't understand "hard" as a design-criteria here.

   The reference is to Murphy's law of bug-free very complex systems.  
Bug-free simple systems are lots more probable in practice.  And there 
is a missing entry on your list between "design correctly" and 
"implement correctly" and that is "specify correctly, and well and 
unambiguously so that everyone understands it the same way".  Another 
variation on Murphy's law says that more complex models have more 
failure points in that area too.
   And, as my evening cheap shot, since you were looking for a fight and
cited C source code and its transportability, it is possibly worth
pointing out that this putatively simple and straightforward,
easy-to-define language has just set a record in the national and
international standards community.  In the relatively short time since
the C Standard was completed, there have been more identified
ambiguities, more formal requests for interpretation, than for any other
Standard programming language in history during the entire lifespans of
those languages. More than Algol or  Pascal, more than FORTRAN or BASIC,
more than COBOL or APT, more even than Ada or PL/I.   And I think that
is an observation about the "hardness of guaranteeing...". 
    --john