ietf-822
[Top] [All Lists]

binary transport vs 8bit character transport revisited (was: why it is a problem to transmit binary as binary in mail)

1991-09-13 06:31:05
Erik Fair writes:

It is a rare day when I say what I am about to say:

Bravo, Mark. Good argument, succinctly expressed.
   I certainly agree.

So, can we dispense with the 8 bit and binary transport nonsense, and
simply encode everything now, like we should have in the first place?

Is an interesting leap.  I took Mark's comments as a criticism of
attempts to support/incorporate *binary* transport--i.e., arbitrary bit
streams without line-length or similar limitations.  There, to put it
mildly, Mark and I, and presumably you, Erik, are in complete agreement.
Indeed, Mark's two notes were in at least partially in response to ones
of mine, where I argued against getting involved with binary transport
from a different point of view, and much less well and clearly, but,
nonetheless, "against". 

Parochial as some may say this view is, I think it is important to
affirm the 7bit-ness and ASCII-ness of SMTP, and leave it be. If you
want to move arbitrary bundles of bits through SMTP, encode. Don't mess
around with a well understood, working mail transport protocol & system.
  But there is an extra jump here, which I don't think Mark's 
discussion, or mine, justifies taking, and that is to a prohibition on 
an extension to transport, without further encoding, of 8bit characters.

  We really have four possible cases in the transport.  All but the 
first represent "change" and "messing around with a well understood, 
working...system".   Let's not kid ourselves by pretending otherwise.  
The differences among those three cases/options whether or not we 
believe the transformations are safe, reasonable, needed, and worth the 
trouble.

Case 1: SMTP carries 7bit ASCII, period.
  Not 7bit national version ISO646, not JIS 2022, and not a variety of
other things that have the same bit patterns of ASCII.  That is what you
get from a narrow reading of 821/822, i.e., the status quo.  Insofar as
anything else works today, it is the result of care, robustness, good
sense, and general good luck. We should be thankful for all four. 

Case 2: SMTP is extended (by fiat, rather than operational protocol
changes) to carry the 7bit-transportable forms of RFC-XXXX.
  In other words, the "7bit" restriction is retained but the "ASCII"
restriction is relaxed in some specific ways.  Despite some 
philosophical nervousness about the precedents this sets, and despite 
some concerns about what this might do to old and retarded UAs (a 
different topic, really), I haven't detected anyone who thinks this 
can't be made to work without problems.  But the reason is because of 
that care, robustness, good sense, and general good luck mentioned 
above, not because 821 anticipated and allowed for this case.
   This is what we call the "no transport changes, do everything in the 
mail headers and format" position.  While it involves no changes to the 
code of things that are conceptually performing as transport agents, it 
does involve a change to 821, even if only by creative retroactive 
interpretation.

Case 3: Expanding SMTP to support 8bit transport of character-and-line 
oriented information.
  Regardless of what one might think about the *necessity* of this --and 
we can debate that endlessly-- it does not raise the technical problems 
of sorting "binary stream" from "text lines" issues that lie (I think) 
at the technical heart of Mark's note and experience, nor the issues of 
parsing a catenation of quite different objects that require different 
sets of parsing rules that were the perspective I was coming from.
  8bit transport for mail-like things (i.e., leaving the line definition
and length rules intact, preserving definitions of concepts like "blank
line" and "CRLF.CRLF") is really quite easy to handle in the traditional 
SMTP context.  Essentially, there are only two sources of difficulty and 
neither, strictly speaking, is a problem with the transport.
   (3.1) How does one determine that it is ok to use 8bit character 
transport?  We've seen two answers to this question posed, everything 
else has been a small variation on one or the other.
  ->  One answer is to make changes/extensions to the 821 envelope
protocol that would explicitly solicit agreement from the receiver
before sending, relying on the experience that SMTP (821) servers will
reject verbs that they have never seen before.  Since it is possible for
reasonable people to read 821 as not requiring that behavior (even
though I think that reading is perverse), this requires much the same
leap of faith in the care, robustness, etc., of existing implementations
that Case 2 does.  Fortunately, as with Case 2, we have been unable to
locate counterexamples, nor to come up with plausible stories about
where such counterexamples might exist so we can go looking for them.
Indeed, it can be argued that this is a little bit (I agree, a *very*
little bit) safer than Case 2, because the server must agree (or at
least appear to agree) that what is about to be sent to it is not
(narrowly read) 821/822. 
  -> The other is to extend exactly the "care, robustness,..." argument
used to justify case 2 by claiming that any careful and robust and
sensible implementation would be "eight bit clean" anyway and point to
several which are.  Then extend that argument with an assertion that
anything that is not robust at the "eight bit clean" level should be and
should be required to be.  Other than some occasional flashes of general
arrogance about the process of altering protocols, this is the essence
of the argument for "just send 8bit" that has popped up several times.
However, with it, unlike Case 1 or the "821 envelope change" approach  
above, we can identify specific real-world situations where it will
cause harm and we can construct models that predict how to look for
other such cases. 
  (3.2) There are nasty and complex problems when one starts examining 
how an 8bit [character] sender should interact with a 7-bit-only, 
near-ASCII, receiver.  It is possible to argue that this problem is so 
complex that we should prohibit 8bit senders, but Mark's (and my) 
arguments don't address that case.  The 8->7 problem has to do with 
whole messages, not parsing and dissection within messages.  There are 
three approaches to solutions to this problem, with everything else I've 
seen being a variation of one of the three or a different balance 
between the second and the third.  Note also that the second and third 
become the first if certain unprohibitable implementation choices are
made. 
  -> Insist that messages that start over 8bit transport be carried over 
8bit transport and delivered to the destination mail server (with
various broad definitions of that term) with some unspecified or more-
or-less dramatic behavior (e.g., message bouncing) if it is not possible 
to negotiate an 8bit transport path.  This has been described as the 
"put the burden on the originating user or system who want to use 8bit 
transport" case.
  -> Provide that intermediate MTAs at 8->7 boundaries may make a
conversion to a 7bit transport form, and that they make this in a 
fashion easiest and most convenient to them.  This is what leads to the 
nested encodings cases, and some other unfortunate problems.  It has 
been described as the "put the burden on the UAs" case.
  -> Provide that intermediate MTAs at 8->7 boundaries may make a 
conversion to a 7bit transport form, and that they must do this in a way 
that simulates what a sensible originating UA would have done if it knew 
that the message was going to travel over a 7bit transport path or it if 
didn't have an 8bit transport path available at all.  This implies 
somewhat more complex MTAs and intra-Internet gateways.  It has been 
described as the "put the burden on the MTAs" case.

Regardless of one's choices between the options for how 8bit character 
transport is (or is not) to be handled, and what choices one makes about 
in-transit conversions, it is important to remember part of what got
these WGs started. As RFC-ZZZZ and the late "architecture" document try
to point out, the demand for 8bit transport of character data is very
real.   There are experimental (sometimes unintentional) implementations 
out there, and there is a lot of 8bit-transport-over-SMTP floating 
around parts of the Internet.  Vendors have claimed that their customers 
insist on it, and that they see few or no interoperability problems and, 
to those vendors at least, those customers are a much more powerful
argument that any decision make by an IETF WG. 
  And there *is* a technical argument for wanting to do this.  It is not 
a bandwidth argument ("the extra bit in the octet"), but an argument for 
simplicity in handling relatively short, simple, non-structured mail 
messages that happen to use non-ASCII characters and, especially, for 
doing so among hosts who make up a community (e.g., those for whom most 
mail travels in a specific language other than English).  Since most of 
the email messages in the world are of the "relatively short, simple, 
non-structured" persuasion, this is an important issue regardless of the 
bandwidth implications and may, by the sheer size of the multiplier, 
actually have bandwidth implications.
  So, IMHO, "dispensing with the nonsense" of 8bit character transport 
is really not an option.  We have only two options here: we can create 
an approved way of doing 8bit character transport if people insist on
doing it and then try to cajole the experimental and outlaw alternatives 
into conforming, or we can expect that the experimental and outlaw 
alternatives will continue doing their thing.  The latter is an 
interoperability threat to the network, the former is, at worst, an 
interoperability problem for those who decide to use it.

And that brings us to...
Case 4. Binary transport, within mail envelopes, of data that are not 
organized into lines or interpretable as "characters" rather than 
"octets".
  That is the case that Mark, myself, and some others have been arguing 
against on technical grounds and for which we have yet to see any strong 
justification, technical or otherwise.

Conclusions...
  -> It is important to be very clear what we are talking about here, to 
not confuse character-and-line-oriented transport with "binary 
transport" and to not confuse "binary transport" with "the ability to 
carry binary data in encoded form".
  -> "8bit character transport" and "binary transport" are not the same 
thing.
  -> The "nonsense which should be dispensed with" line should be drawn
between Cases 3 and 4, not above Case 3.

And, with regard to the options under Case 3, we still need to see real 
proposals from the advocates of dynamic, in-transit, 8->7 conversions.  
Ned's grand compromise proposal provides a model for this, but, in the 
cases people have worked through on the list (or I've seen in private 
notes) that model is most likely to come into play in the region of the 
destination mail server (e.g., it gets to "the gateway for my company" 
and I convert it there to an appropriate form for the target 
intra-company host) rather than somewhere out in the aether that is 
under the control of neither sender nor destination.
  Even if all of the in-transit conversion options turned out to be 
impractical, we would still have "responsibility of the originating 
system" as an option, and some people would want to use 8bit character 
transport that way.  The worst thing that can be said about that is that 
there is no accounting for taste; it does not pose any threats to the 
7-bit-forever world :-).

If the extra bit in the octets is *that* important to you, then we need
to talk about a new protocol, on a new TCP port, especially optimized
to move lots of data very fast with as few end-to-end turn-arounds as
possible (since it seems to be the argument that the efficiency gained
by binary transport will become worth it when the messages get "large"
...
   I agree with the general tone of this argument as it applies to 
binary transport.  It is basically a variation of the case for a 
separate protocol for sender-initiated, password-free file transfer that 
is distinct from the common mail system.

In my capacity as Postmaster for Apple Computer, Inc., I will shortly
be issuing a TechNote to our developers (in house and third party)
which describes a standard for encapsulating an arbitrary Macintosh
file in Internet mail (with the caveat that the IETF may eventually
...
   And, given the nature of "arbitrary Macintosh file", this is one of 
the binary-data-encoded-in-mail cases for which the alternative would be
binary-transport: 8bit character transport just wouldn't do, or would
require some encoding for it.  And, as some of the "content-type" 
arguments have indicated, whatever wants to read, decode, and use this
had probably be able to do a rather good simulation of a Macintosh, 
independent of what encoding is chosen.
  However, Erik, I'll bet an ice cream cone, for delivery in either 
Cambridge or Cupertino, that, if an 8bit character transport extension 
is specified and standardized, Apple's European (and maybe Asian) 
customers will "persuade" you to implement it in under a couple of
years.   Unless, of course, by then, you have succeeded in eliminating 
all machines that are not Macintoshes from the Internet, in creating a 
mandatory Posix interface through with arbitrary Macintosh files can be 
read and written, or convincing all Macintosh users that the "everyone 
else" in the world is not worth communicating with.  :-)

   --john