Leo j mclaughlin iii writes...
We *need* binary transport because [see text of messages from last
seven months]. Yes, I do intend to show/run arbitrary data (such as
bitmaps, lotus spreadsheets, and MS-word docs) without PEM. Yes, I do
intend to allowing users to specify proper defaults for declining
information for particular applications and/or from particular hosts.
Leo,
I may be the only one who feels this way, so, if no one else speaks
up, I'll drop this line of inquiry with this message, which is intended
largely to clarify my (possibly unique) position.
The statement above makes a strong case that it is useful to be able
to transport binary information. But I don't think that has ever been
in doubt: as soon as one says "transport over mail is a reasonable file
transfer mechanism", then it follows almost immediately that one will
want to transport "binary" things.
What I'm still not grasping is why this, of necessity, implies binary
transport.
Maybe it is helpful to look at some history and a few cases.
Within the strict IP internet, we have done binary file transfer for
years with FTP. It works, and permits (plus or minus intra-system
idiosyncracies which would appear with any binary transfer) showing/
running things on arrival. How safe that is depends on the relationship
between source and target systems, and I suppose it would be reasonable
to assert that most of us live a bit more dangerously than the truly
security-paranoid might encourage. Now in the FTP case, since data
transmission is out-of-band, what arrives is a file, ready to show/
run/whatever. But there are three major perceived restrictions with
FTP:
(i) I've got to be able to validate myself on the remote machine--
usually by having an account there--to do a "send",
(ii) both machines have to be functioning and reasonably well
connected at the same time-- there is no multihop store-and-forward
arrangement possible except in a few odd (and rarely supported)
third-party transfer situtations-- and
(iii) it only works within the IP internet, not through, e.g., the
type of gateways that can handle mail.
The first of these restrictions is, I think, the source of the demand
for sender initiated, password-free, file transfer within the internet.
That doesn't necessarily have anything to do with mail: it is pretty
easy to "fix" FTP to support sender-initiated password free transfers.
Some of the "fixes" have been kicked around and even implemented; a few
don't even require changes to the protocol itself and are moderately
secure.
The third is the source of the desire to mail this stuff.
And, depending on one's goals, interests, and ways of looking at
things, the second problem is either part of the third (i.e., "we know
how to deal with this in mail") or an argument for much more
sophistication--e.g., greater capability for batching, automated retry,
checkpoint/restart, and possibly more support for third-party transfer--
in FTP.
The thing that joins the three problems is the perception that they
are all the same (depends on how you look at the world) and that one
should minimize the number of mechanisms the user needs to understand (I
can certainly agree with that one).
There is one other historical problem with FTP implementations
(probably not the protocol), and that is that we tend to not carry
around quite enough information to describe a non-text file that is
being moved around the network. Our file descriptors have gotten a lot
more complicated than they were fifteen years ago, and FTP
implementations have not kept up. The problem is isomorphic with trying
to figure out what might be required to follow "content-type: binary/"
in an RFC-XXXX header so the file can be sensibly reconstructed.
Now, because it represents a major collection of experience with
implementation and long-term use, let's take a look at BITNET. They
have had sender-initiated password free file transfer since (their) day
one. To all intents and purposes, mail rides on top of it, rather than
vice versa. Those files arrive on my BITNET machine identified as
file-objects, not as mail-objects. That identification does not occur
as a consequence of decoding a mail header, or even as a result of
decoding a *mail* transport envelope: it is in the *real* transport
envelope; in the Internet's layering, it would be the near-equivalent of
the TCP header carrying the information. That outermost envelope also
contains the file name, which is pretty handy. Like FTP, it also
doesn't work across mail-type gateways (there have been intra-BITNET
discussions about using RFC-XXXX as a transport for moving BITNET file
transfers across the internet between NJE networks).
Partially as a consequence of the fact that the transport layer is
pretty exposed with this stuff, and the absence of Mail transport agents
on the sending side (the user typically passes the file directly to the
low-level transport agent), the state of validation of the sender is
typically a tad higher than with mail. Not "secure" or "authenticated",
but it takes a bit more to fake a message.
If one of those files arrives on my machine, and is "binary", what I
do depends on all the usual stuff. I'm not as paranoid in practice as I
am in theory. If I know the sender, and have reasonable confidence that
the file isn't somehow forged, and was expecting that sender to send me
something, than I may execute the thing. My confidence is going to be
higher if the file originates within "my" network than if it originates
somewhere that I've never heard of. And so on. The less confidence I
have about the file, the more validating I'm going to do before trying
to execute/show/run it.
So one could implement a close approximately to the BITNET capability
over FTP or over some new protocol that might provide, even, for
intra-internet store and forward. While the argument is less strong, I
don't think those solutions (especially the latter) fly for the same
reasons that we stopped talking about "new mail" on new ports. But the
important thing is that we are still intra-internet. If the "need" is
for that, or for rapid and clutter-free movement of data among a
collection of mutually trusted sites (a subnet somewhere?), then one
might be able to decide that those sites were all interoperable at the
binary-transport level and avoid the "need" for conversion gateways. It
is mostly the conversion gateways, and the consequent RFC-XXXX
implications, that bother me, not the binary transport.
Ah, but the third case applies, and you really want to send this binary
information through mail gateways. But, at those gateways, the
information is likely to undergo conversion of some form. It may have
to be turned into characters, it may have to go over a transport that is
native in some other character set, it may have to be forced into
"lines". And any of those things may require not only conversion of the
bits into, e.g., base64, but additional file descriptive information so
that the thing can be structurally reconstructed at the far end. The
ISO experience (and this is one of the areas where they *do* have
experience) is that this quickly takes you down the slippery slope
toward data descriptive languages, data element dictionaries, and other
complexities that I'd be very happy to keep out of a mail protocol that
works well precisely because it is pretty simple.
And, because of the technical problems associated with all of this,
there is a very strong case to be made for end-to-end checksums.
Trashed executables probably just won't execute, but trashed database
updates...
By the time one gets through with all of that fussing around, any
performance advantages of having binary transport capability have gone
south.
Note that, if the message starts in structured RFC-XXXX, the user can
supply the descriptive information in another message part or otherwise
organize things properly (let me come back to that below).
So I'm having trouble understanding the need for--or the technical
feasibility and meaning of--binary transfer across mail gateways.
Within the Internet, "binary transport" still has a limited meaning
unless one does it with FTP or a new protocol. There have been no
proposals to change RFC822 or XXXX to remove the line-orientation of
headers, message part boundaries, and the like. So this is presumably
binary stuff embedded within very line-oriented messages. Untangling
that, including finding the content-type fields and subtypes and
interpreting them, isn't trivial--it isn't like picking up a BITNET
file, or an FTPed file, and just reading/showing/running it. One has to
parse the thing, using the lines where there are lines and, presumably,
treating the binary stuff as a very long line which ends before the next
(line-oriented) body part delimiter. And you need counts to find "end
of binary material", with all of the validity threats that exist for
counts.
Given that amount of parsing and decoding I'd guess that the marginal
costs of designing into RFC-XXXX, if you want it, an eight-bit
line-oriented model that would:
--be content-type: binary-plus (or maybe "binary")
--insert, as base64 does, non-significant CRLF sequences at
appropriate intervals.
--escape "real" CR LF sequences in some appropriate way that does not
get confused with the fake ones.
--if 8->7 conversions are to be attempted, invent "KHFO: base64NoCRLF"
to tell the converter that the "bare" CRLF sequences are to be discarded
in going to base64, not converted. "KHFO: base64NoCRLF" converts to
content-transport-encoding: base64, not to something else special.
No transport changes other than the 8bit line-oriented ones and
fairly trivial (given everything else) changes to any gateways which you
would actually expect to do conversions (although I'm still dubious
about that).
--or--
You decide that the users with/for whom you "expect to show/run
arbitrary data..." are really operating in a mutually-trusted enclave
environment in which everyone is using the TCP/IP internet and running
mail software at the same capability level. If that is the case, then
perhaps we should be working out the details of no-conversion enclave
protocols for binary transport. Or we should be discussing a
sender-initiated file transfer protocol on a new port with arrangements
for transforming into mail-reader formats once the stuff arrives (if you
really want to do that).
What I'm trying to say, Leo, is that there is no question that being
able to transfer binary information over mail is a requirement. But I'm
convinced, based on some significant (however odd) experience and
perspective, that binary transport over SMTP is likely to turn out to be
a very poor engineering idea. That doesn't spell "impossible", so
showing me that you know how to do it in the strictly-connected
internet wouldn't prove anything.
So what I'm trying to find out is exactly how you expect to use this,
among whom, and why. And I'm trying to understand that in enough detail
to be able to evaluate, for myself, your implied claim that simply
stuffing the material into base64 or by the use of some near-binary
system, as outlined above, is inadequate/unacceptable and why. And,
finally, if you can convince me/us that binary transport is really a
requirement, I'm trying to figure out if one has to convert that at 8->7
boundaries, and, if so, what they might mean, both in terms of the
transformations and in terms of how much more complicated it makes the
gateways (marginally, over the requirements already implied by 'no
nested encoding' RFC-XXXX, of course).
As I said at the beginning, if I'm the only one with these concerns, I
hereby drop this line of discussion. So, if there are others out there,
let's hear from them.
A final observation, which won't go away, and [another] associated
strawman. The observation has been made as part of the "content-type"
discussion that we really don't understand subtypes of "binary", that,
in CCITT's dreaded language, we are going to need to leave this as an
area for further study. To at least some extent, that problem is tied
up with the very difficult issue of abstracting the ways in which
individual operating systems describe their own files into a canonical
structure for use as part of a network protocol. I think that folks who
want binary transport among potentially-heterogeneous hosts, and
especially those who want it across mail gateways, have some obligation
to address this problem very seriously.
If the real interest in binary transport only arises in homogeneous
"consenting adult" environments, then they should by all means propose
an envelope negotiation for that (which might even not use RFC-XXXX as a
message format--it is probably both more complicated and less
complicated than what is needed) and leave everyone else out of the
picture.
By contrast, there may be a case to be made for a new top-level
content-type structure in RFC-XXXX. In outline, it might be a two
(exactly) part message body called "object" or something like that,
with the first part containing the object description in some form
determined by the subtype and the second part containing, in "binary"
form (whether encoded or otherwise modified for transport or not), the
object content. It would permit a high-level way to transport and bind
file description information to data as well as to do some more complex
things that Nathaniel and I are interested in when we are not doing
"mail". And it is a little bit different from "application", although
one might be able to structure it as subtypes and sub-subtypes of
application.
Now that is a vague, strawman, suggestion that would need a lot of
working out.
But, while I'd also be happy to just see the issue disappear, I'd feel
a lot more comfortable about thinking about transport modifications for
binary if I saw the message format work being done--like the above--that
would make its use practical.
--john