What I see as problems to solve ... and a strawman solution


[Much of this has been said before. This is just my perspective on it.]

I've been thinking for a while about similar stuff in a more Usenet  
context, but what I've come up with so far actually fits better into mail:  
on the Usenet side I still have some problems I don't really know how to  
solve.

But, anyway, here's what I think about mail problems:

1. Mail should really be binary transparent (just like ftp can be with the  
right options - that's where mail came from, in the beginning ...)

1a. That means the silly 7 bit restriction has to go. No surprise here.

1b. Binary transfer is actually more or less solved: see BDAT.

1c. MIME multiparts ... ugh.

1d. Anyway, with full binary transparency, Content-Transfer-Encoding can  
go. Yay.

2. Mail headers have become incredibly baroque.

2a. One part is encoding lots of different char sets into ASCII, using at  
least three ... maybe five ... with IDN, maybe even six different encoding  
methods.

2b. Another is a similarly much-too-high number of methods for structuring  
header content further. Which implies a number of different ways to  
tokenize headers.

2c. Also, the meaning of many headers has become overloaded.

3. The spam problem has made it painfully obvious that the infrastructure  
must be significantly better secured.

3a. Server-to-server communications need to be authenticated.

3b. Client-to-server communications need to be authenticated.

3c. All of this must generate a trace that's easily machine-verifyable.

3d. (If you want anonymity, as has been suggested, layer it above this  
system.)

3e. End-to-end message signing has ugly problems.

3e1. Signing converted text, and relying on nobody reconverting it, in a  
mail world where text is converted all the time.

3e2. No good way to sign mail headers, as they get munged all the time.


I think that should be enough for a start. Now for the strawman solution.  
Important design criterion: invent as little really new tech as you can  
get away with; use what already exists.


1. Define a binary variant of MIME, as follows:

* every entity has a Content-Length: nnnn header that must be byte- 
accurate.
* no Content-Transfer-Encoding. Everything is like CTE: binary.
* No multipart-boundary. This function is solved by the part's Content- 
Length: headers.
* probably make default charset be UTF-8 instead of ASCII.

2. Define an XML format for headers and envelopes. No non-XML sub-encoding  
- every piece of information and structure must be basic XML. There's  
enough there so other tricks just aren't necessary. (And namespaces seem  
of doubtful use in this context.) This also allows for defining better  
semantics for headers, as appropriate. Oh, and inside this thing, charset  
MUST be UTF-8 period.

3. Define a complete mail structure as follows:

<outer multipart>
part 1: XML envelope. Includes source, target, trace headers, crypto sig  
of part 2, whatever.
part 2: <main multipart>
        part 1: XML header. What we usually put in headers now.
        part 2: content. Possibly another multipart. Mostly as today.

Incidentally, that means if you concatenate a number of mails, all you  
need is to preface it with a multipart header and you have another valid  
binary-MIME object. That may be useful in some contexts.

4. Transport is much as it is today, with the following exceptions:

* Binary transport, of course. BDAT?
* Use the new XML envelope instead of the old SMTP commands. Possibly  
transmit envelope for inspection first and allow abort of mail at that  
point.
* every sending MTA *must* have at least one certificate, and this  
certificate MUST be interrogated by the receiving MTA and noted into a  
trace header. Possibly even demand TLS-only, but I'm less convinced of  
that.
* those certs really should be verifyable by the receiver - possibly via  
something in DNS; definitively not by following the browser model and  
effectively forcing every cert to be signed by a very few agencies like  
Verisign.
* Probably a relaying MTA really ought to sign the trace info it  
generates.
* Absolute prohibition on modifying anything except the envelope, except  
by prior explicit agreement. (That is, a business' MTA might put these  
stupid "this email is for you only if it really is for you" things into  
mail it sends, but a standard transit MTA does absolutely nothing to the  
mail proper. The common desire to not expose net internals can still be  
done as that's now in the envelope. On-the-fly content recodings are no  
longer needed. And signatures of the *entire* non-envelope part remain  
valid.)
* hopefully, a much stricter definition of the trace format.
* The submission MTA really ought to also sign the non-envelope part, as  
in "this is how the mail originated here". (The above business MTA would  
then be modeled as a re-submission.) Or maybe it signs a checksum while  
creating the first trace entry. This first trace entry should record where  
the mail came from, but need not do so in a format third parties can  
understand so long as the local admin can.

I think that's all of the model I had thought out. Well, except for one  
thing: it might be prudent to decide on a preferred non-plain text format,  
and that probably should not be HTML but a format that does structure, not  
optics. Possibly some variant of Docbook might do.


MfG Kai