[Much of this has been said before. This is just my perspective on it.]
I've been thinking for a while about similar stuff in a more Usenet
context, but what I've come up with so far actually fits better into mail:
on the Usenet side I still have some problems I don't really know how to
solve.
But, anyway, here's what I think about mail problems:
1. Mail should really be binary transparent (just like ftp can be with the
right options - that's where mail came from, in the beginning ...)
1a. That means the silly 7 bit restriction has to go. No surprise here.
1b. Binary transfer is actually more or less solved: see BDAT.
1c. MIME multiparts ... ugh.
1d. Anyway, with full binary transparency, Content-Transfer-Encoding can
go. Yay.
2. Mail headers have become incredibly baroque.
2a. One part is encoding lots of different char sets into ASCII, using at
least three ... maybe five ... with IDN, maybe even six different encoding
methods.
2b. Another is a similarly much-too-high number of methods for structuring
header content further. Which implies a number of different ways to
tokenize headers.
2c. Also, the meaning of many headers has become overloaded.
3. The spam problem has made it painfully obvious that the infrastructure
must be significantly better secured.
3a. Server-to-server communications need to be authenticated.
3b. Client-to-server communications need to be authenticated.
3c. All of this must generate a trace that's easily machine-verifyable.
3d. (If you want anonymity, as has been suggested, layer it above this
system.)
3e. End-to-end message signing has ugly problems.
3e1. Signing converted text, and relying on nobody reconverting it, in a
mail world where text is converted all the time.
3e2. No good way to sign mail headers, as they get munged all the time.
I think that should be enough for a start. Now for the strawman solution.
Important design criterion: invent as little really new tech as you can
get away with; use what already exists.
1. Define a binary variant of MIME, as follows:
* every entity has a Content-Length: nnnn header that must be byte-
accurate.
* no Content-Transfer-Encoding. Everything is like CTE: binary.
* No multipart-boundary. This function is solved by the part's Content-
Length: headers.
* probably make default charset be UTF-8 instead of ASCII.
2. Define an XML format for headers and envelopes. No non-XML sub-encoding
- every piece of information and structure must be basic XML. There's
enough there so other tricks just aren't necessary. (And namespaces seem
of doubtful use in this context.) This also allows for defining better
semantics for headers, as appropriate. Oh, and inside this thing, charset
MUST be UTF-8 period.
3. Define a complete mail structure as follows:
<outer multipart>
part 1: XML envelope. Includes source, target, trace headers, crypto sig
of part 2, whatever.
part 2: <main multipart>
part 1: XML header. What we usually put in headers now.
part 2: content. Possibly another multipart. Mostly as today.
Incidentally, that means if you concatenate a number of mails, all you
need is to preface it with a multipart header and you have another valid
binary-MIME object. That may be useful in some contexts.
4. Transport is much as it is today, with the following exceptions:
* Binary transport, of course. BDAT?
* Use the new XML envelope instead of the old SMTP commands. Possibly
transmit envelope for inspection first and allow abort of mail at that
point.
* every sending MTA *must* have at least one certificate, and this
certificate MUST be interrogated by the receiving MTA and noted into a
trace header. Possibly even demand TLS-only, but I'm less convinced of
that.
* those certs really should be verifyable by the receiver - possibly via
something in DNS; definitively not by following the browser model and
effectively forcing every cert to be signed by a very few agencies like
Verisign.
* Probably a relaying MTA really ought to sign the trace info it
generates.
* Absolute prohibition on modifying anything except the envelope, except
by prior explicit agreement. (That is, a business' MTA might put these
stupid "this email is for you only if it really is for you" things into
mail it sends, but a standard transit MTA does absolutely nothing to the
mail proper. The common desire to not expose net internals can still be
done as that's now in the envelope. On-the-fly content recodings are no
longer needed. And signatures of the *entire* non-envelope part remain
valid.)
* hopefully, a much stricter definition of the trace format.
* The submission MTA really ought to also sign the non-envelope part, as
in "this is how the mail originated here". (The above business MTA would
then be modeled as a re-submission.) Or maybe it signs a checksum while
creating the first trace entry. This first trace entry should record where
the mail came from, but need not do so in a format third parties can
understand so long as the local admin can.
I think that's all of the model I had thought out. Well, except for one
thing: it might be prudent to decide on a preferred non-plain text format,
and that probably should not be HTML but a format that does structure, not
optics. Possibly some variant of Docbook might do.
MfG Kai