Re: Transport Encodings

Leo j mclaughlin iii writes...

We *need* binary transport because [see text of messages from last
seven months].  Yes, I do intend to show/run arbitrary data (such as
bitmaps, lotus spreadsheets, and MS-word docs) without PEM.  Yes, I do
intend to allowing users to specify proper defaults for declining
information for particular applications and/or from particular hosts.


Leo,
  I may be the only one who feels this way, so, if no one else speaks 
up, I'll drop this line of inquiry with this message, which is intended 
largely to clarify my (possibly unique) position.
  The statement above makes a strong case that it is useful to be able
to transport binary information.  But I don't think that has ever been 
in doubt: as soon as one says "transport over mail is a reasonable file
transfer mechanism", then it follows almost immediately that one will 
want to transport "binary" things.
  What I'm still not grasping is why this, of necessity, implies binary 
transport.

Maybe it is helpful to look at some history and a few cases.

  Within the strict IP internet, we have done binary file transfer for 
years with FTP.  It works, and permits (plus or minus intra-system 
idiosyncracies which would appear with any binary transfer) showing/ 
running things on arrival.  How safe that is depends on the relationship 
between source and target systems, and I suppose it would be reasonable 
to assert that most of us live a bit more dangerously than the truly 
security-paranoid might encourage.  Now in the FTP case, since data
transmission is out-of-band, what arrives is a file, ready to show/ 
run/whatever.   But there are three major perceived restrictions with
FTP: 
  (i) I've got to be able to validate myself on the remote machine--
usually by having an account there--to do a "send",
  (ii) both machines have to be functioning and reasonably well
connected at the same time-- there is no multihop store-and-forward
arrangement possible except in a few odd (and rarely supported)
third-party transfer situtations-- and 
  (iii) it only works within the IP internet, not through, e.g., the
type of gateways that can handle mail. 
  The first of these restrictions is, I think, the source of the demand
for sender initiated, password-free, file transfer within the internet.  
That doesn't necessarily have anything to do with mail: it is pretty 
easy to "fix" FTP to support sender-initiated password free transfers.  
Some of the "fixes" have been kicked around and even implemented; a few 
don't even require changes to the protocol itself and are moderately 
secure.
  The third is the source of the desire to mail this stuff.
  And, depending on one's goals, interests, and ways of looking at 
things, the second problem is either part of the third (i.e., "we know 
how to deal with this in mail") or an argument for much more 
sophistication--e.g., greater capability for batching, automated retry, 
checkpoint/restart, and possibly more support for third-party transfer-- 
in FTP.
  The thing that joins the three problems is the perception that they 
are all the same (depends on how you look at the world) and that one 
should minimize the number of mechanisms the user needs to understand (I 
can certainly agree with that one).

  There is one other historical problem with FTP implementations 
(probably not the protocol), and that is that we tend to not carry 
around quite enough information to describe a non-text file that is 
being moved around the network.  Our file descriptors have gotten a lot 
more complicated than they were fifteen years ago, and FTP 
implementations have not kept up.  The problem is isomorphic with trying 
to figure out what might be required to follow "content-type: binary/" 
in an RFC-XXXX header so the file can be sensibly reconstructed.

  Now, because it represents a major collection of experience with 
implementation and long-term use, let's take a look at BITNET.  They 
have had sender-initiated password free file transfer since (their) day 
one.  To all intents and purposes, mail rides on top of it, rather than 
vice versa.  Those files arrive on my BITNET machine identified as 
file-objects, not as mail-objects.  That identification does not occur 
as a consequence of decoding a mail header, or even as a result of 
decoding a *mail* transport envelope: it is in the *real* transport
envelope; in the Internet's layering, it would be the near-equivalent of
the TCP header carrying the information.  That outermost envelope also 
contains the file name, which is pretty handy.  Like FTP, it also
doesn't work across mail-type gateways (there have been intra-BITNET
discussions about using RFC-XXXX as a transport for moving BITNET file
transfers across the internet between NJE networks). 
  Partially as a consequence of the fact that the transport layer is
pretty exposed with this stuff, and the absence of Mail transport agents
on the sending side (the user typically passes the file directly to the
low-level transport agent), the state of validation of the sender is
typically a tad higher than with mail.  Not "secure" or "authenticated",
but it takes a bit more to fake a message. 
  If one of those files arrives on my machine, and is "binary", what I 
do depends on all the usual stuff.  I'm not as paranoid in practice as I 
am in theory.  If I know the sender, and have reasonable confidence that 
the file isn't somehow forged, and was expecting that sender to send me 
something, than I may execute the thing.  My confidence is going to be 
higher if the file originates within "my" network than if it originates 
somewhere that I've never heard of.  And so on.  The less confidence I 
have about the file, the more validating I'm going to do before trying 
to execute/show/run it.

So one could implement a close approximately to the BITNET capability 
over FTP or over some new protocol that might provide, even, for 
intra-internet store and forward.  While the argument is less strong, I 
don't think those solutions (especially the latter) fly for the same 
reasons that we stopped talking about "new mail" on new ports.  But the 
important thing is that we are still intra-internet.  If the "need" is 
for that, or for rapid and clutter-free movement of data among a 
collection of mutually trusted sites (a subnet somewhere?), then one 
might be able to decide that those sites were all interoperable at the 
binary-transport level and avoid the "need" for conversion gateways.  It 
is mostly the conversion gateways, and the consequent RFC-XXXX 
implications, that bother me, not the binary transport.

Ah, but the third case applies, and you really want to send this binary 
information through mail gateways.  But, at those gateways, the 
information is likely to undergo conversion of some form.  It may have 
to be turned into characters, it may have to go over a transport that is 
native in some other character set, it may have to be forced into 
"lines".  And any of those things may require not only conversion of the 
bits into, e.g., base64, but additional file descriptive information so 
that the thing can be structurally reconstructed at the far end.  The 
ISO experience (and this is one of the areas where they *do* have 
experience) is that this quickly takes you down the slippery slope 
toward data descriptive languages, data element dictionaries, and other 
complexities that I'd be very happy to keep out of a mail protocol that 
works well precisely because it is pretty simple.
  And, because of the technical problems associated with all of this, 
there is a very strong case to be made for end-to-end checksums.  
Trashed executables probably just won't execute, but trashed database 
updates...
  By the time one gets through with all of that fussing around, any 
performance advantages of having binary transport capability have gone 
south.
  Note that, if the message starts in structured RFC-XXXX, the user can 
supply the descriptive information in another message part or otherwise 
organize things properly (let me come back to that below).
  So I'm having trouble understanding the need for--or the technical
feasibility and meaning of--binary transfer across mail gateways.

  Within the Internet, "binary transport" still has a limited meaning 
unless one does it with FTP or a new protocol.  There have been no 
proposals to change RFC822 or XXXX to remove the line-orientation of 
headers, message part boundaries, and the like.  So this is presumably 
binary stuff embedded within very line-oriented messages.  Untangling 
that, including finding the content-type fields and subtypes and 
interpreting them, isn't trivial--it isn't like picking up a BITNET 
file, or an FTPed file, and just reading/showing/running it.  One has to 
parse the thing, using the lines where there are lines and, presumably, 
treating the binary stuff as a very long line which ends before the next 
(line-oriented) body part delimiter.  And you need counts to find "end 
of binary material", with all of the validity threats that exist for 
counts.
  Given that amount of parsing and decoding I'd guess that the marginal 
costs of designing into RFC-XXXX, if you want it, an eight-bit 
line-oriented model that would:
  --be content-type: binary-plus (or maybe "binary")
  --insert, as base64 does, non-significant CRLF sequences at 
appropriate intervals.
  --escape "real" CR LF sequences in some appropriate way that does not 
get confused with the fake ones.
  --if 8->7 conversions are to be attempted, invent "KHFO: base64NoCRLF" 
to tell the converter that the "bare" CRLF sequences are to be discarded 
in going to base64, not converted.  "KHFO: base64NoCRLF" converts to 
content-transport-encoding: base64, not to something else special.

    No transport changes other than the 8bit line-oriented ones and 
fairly trivial (given everything else) changes to any gateways which you 
would actually expect to do conversions (although I'm still dubious 
about that).

  --or--
You decide that the users with/for whom you "expect to show/run
arbitrary data..." are really operating in a mutually-trusted enclave 
environment in which everyone is using the TCP/IP internet and running 
mail software at the same capability level.  If that is the case, then 
perhaps we should be working out the details of no-conversion enclave 
protocols for binary transport.  Or we should be discussing a 
sender-initiated file transfer protocol on a new port with arrangements 
for transforming into mail-reader formats once the stuff arrives (if you 
really want to do that).

What I'm trying to say, Leo, is that there is no question that being 
able to transfer binary information over mail is a requirement.  But I'm 
convinced, based on some significant (however odd) experience and 
perspective, that binary transport over SMTP is likely to turn out to be 
a very poor engineering idea.  That doesn't spell "impossible", so 
showing me that you know how to do it in the strictly-connected 
internet wouldn't prove anything.
  So what I'm trying to find out is exactly how you expect to use this, 
among whom, and why.  And I'm trying to understand that in enough detail 
to be able to evaluate, for myself, your implied claim that simply 
stuffing the material into base64 or by the use of some near-binary 
system, as outlined above, is inadequate/unacceptable and why.  And, 
finally, if you can convince me/us that binary transport is really a 
requirement, I'm trying to figure out if one has to convert that at 8->7 
boundaries, and, if so, what they might mean, both in terms of the 
transformations and in terms of how much more complicated it makes the 
gateways (marginally, over the requirements already implied by 'no
nested encoding' RFC-XXXX, of course). 

As I said at the beginning, if I'm the only one with these concerns, I 
hereby drop this line of discussion.  So, if there are others out there, 
let's hear from them.

A final observation, which won't go away, and [another] associated 
strawman.  The observation has been made as part of the "content-type"
discussion that we really don't understand subtypes of "binary", that,
in CCITT's dreaded language, we are going to need to leave this as an
area for further study.  To at least some extent, that problem is tied 
up with the very difficult issue of abstracting the ways in which 
individual operating systems describe their own files into a canonical 
structure for use as part of a network protocol.  I think that folks who 
want binary transport among potentially-heterogeneous hosts, and 
especially those who want it across mail gateways, have some obligation 
to address this problem very seriously.
  If the real interest in binary transport only arises in homogeneous 
"consenting adult" environments, then they should by all means propose 
an envelope negotiation for that (which might even not use RFC-XXXX as a 
message format--it is probably both more complicated and less 
complicated than what is needed) and leave everyone else out of the 
picture.
  By contrast, there may be a case to be made for a new top-level 
content-type structure in RFC-XXXX.  In outline, it might be a two 
(exactly) part message body called "object" or something like that, 
with the first part containing the object description in some form 
determined by the subtype and the second part containing, in "binary" 
form (whether encoded or otherwise modified for transport or not), the 
object content.  It would permit a high-level way to transport and bind 
file description information to data as well as to do some more complex 
things that Nathaniel and I are interested in when we are not doing 
"mail".  And it is a little bit different from "application", although 
one might be able to structure it as subtypes and sub-subtypes of 
application.
  Now that is a vague, strawman, suggestion that would need a lot of 
working out.

  But, while I'd also be happy to just see the issue disappear, I'd feel 
a lot more comfortable about thinking about transport modifications for 
binary if I saw the message format work being done--like the above--that 
would make its use practical.

   --john