more comments on draft-crocker-email-arch-00



Firstly, I'd like to thank Dave for writing this document, because I've
been meaning to do so myself when I finish putting my MXs back together.
A shorter TODO list is a happy TODO list :-) I have some extensive
comments based on my earlier thoughts on the subject. Apologies for the
length of this message...

----

First, some thoughts that will inform my comments on the draft.

I identify three layers in Internet email, different from the layers in
the draft. My layers have the property that they each admit different
implementations independent of the other layers. In fact, not only is this
possible but it has already happened at least once for all of them. The
boundaries between these layers are a little bit unconventional and are
certainly not clean, but this is a post-hoc architecture so beauty is
inevitably hard to find.

e2) transport layer i.e. SMTP.

Other implementations include LMTP, batch SMTP, UUCP, and way back in the
ARPANET days, FTP.

Apart from the latter these are all store-and-forward protocols that have
built-in accommodation for delays (from failures or dial-ups or quotas),
retries (to deal with delays), gateways (protocol translation, security
boundaries), and the lack of end-to-end connectivity between the sender
MUA and the recipient message store. SMTP is unusual in distinguishing
between permanent and temporary failures in a consistent manner.

(As an alternative, consider sending email by using an anonymous IMAP
login to the recipient's message store and APPENDing the message to the
inbox -- or even the equivalent using Web DAV :-) How would you prevent
spam in this scenario?)

e3) email addresses

Here I'm including all addresses, whether in the message header or in the
envelope.

Originally these were host-based source-routed addresses, e.g.
<@smtp.hermes.cam.ac.uk:dcrocker(_at_)brandenburg(_dot_)com> or
<@mx.cam.ac.uk:fanf2(_at_)cyrus-1(_dot_)hermes(_dot_)cam(_dot_)ac(_dot_)uk>. 
These are semantically
very similar to UUCP bang paths, which could be another implementation of
this layer were it not for syntactic restrictions in RFCs 821 and 822.

For the last 20 years we've had an idea of mail domains distinct from
hostnames and advertised in the DNS using MX records. Most of the subtlety
of Internet email exists in how addresses are used, and they are the main
focus of current validation/authentication standards efforts.

Perhaps there will be further evolution. At the moment there's a general
assumption that a valid email address may be used in any context. However
setups are becoming increasingly strict, such that some addresses are only
valid as destination addresses (therefore which may NOT appear in MAIL
FROM or as the RCPT TO of a bounce), and some are only valid as return
addresses (e.g. VERP or SRS or BTAV addresses which may ONLY appear in
MAIL FROM or as the RCPT TO of a bounce).

e4) message content

Originally email was just bare ASCII, but now we have MIME. As well as
MIME this layer includes most of the RFC 822 header. However parts of the
header are covered by lower layers, such as the Received: trace fields
(transport layer) and email addresses in the addressing fields
(Resent-)Sender/From/To/CC/BCC.

An odd result of the way I have divided the layers is that part of an
addressing field is in the content layer (the display-name which may be
MIME-encoded) and part is in the address layer (er, the address itself).
This sort of makes sense if you consider the message content to be what a
non-technical user is interested in, and for them addressing is often
tucked away inside the user interface to a directory and hidden behind
the display-name.

----

A bad layering analogy.

If you wondered why I started the numbering above at 2, it's because I'm
about to make a cute comparison with the OSI model as it is usually
applied to the Internet.

SMTP is to email as Ethernet (etc.) is to the Internet, i.e. a message
takes multiple SMTP hops to get to its destination just as a packet takes
multiple LAN or WAN hops. MTA <-> router, queue <-> packet buffers.

Email addresses are the bottom-most end-to-end feature in email, as are IP
addresses on the Internet. In this analogy, MX lookups correspond to ARP,
translating a later 3 object (email/IP address) into a layer 2 one (IP/MAC
address).

Message content obviously corresponds to packet content. More amusingly,
some higher-level MIME features such as message fragmentation have direct
counterparts at the TCP level.

A tempting comparison is between email address aliasing (as in the Sieve
redirect action) and NAT. Many people in the MARID camp claim that email
aliasing is evil bad and wrong because it breaks the assumption that the
SMTP originator information must correspond to the email address
originator information. The apparent destination address isn't the actual
destination if it's an alias. However this comparison is bogus: the
destination of the alias is still globally routable, and it still sees the
original return path not one pointing to the aliasing MTA.

The analogy breaks down here, which is why I consider it to be bad and
good only for a bit of fun.

----

Identities.

I'm not very happy with the concept of "identities" used in the draft. It
bundles together layer e2 and layer e3 information -- there are big tables
about "setting" identities, some of which settings only last for the
lifetime of a connection and some of which for the lifetime of a message.
Is a Received: trace field really an identity that is set? I would say
that the EHLO domain is "stated" rather than "set".

The draft also omits almost all consideration of the layer e4 identities
apart from a brief mention of List-*. It would be worth including the
Message-ID: of a message and how it refers to previous messages using the
References: and In-Reply-To: fields. The assumption that Message-IDs are
unique and are sometimes used for duplicate suppression should be
mentioned. The Content-ID may also be worth considering, if only as a
reference to the MIME RFCs.

I think that this system of presentation is particularly confusing when it
comes to the process of message submission.

A quick note on 2.1 Mailbox Addresses:

"returned to its originator" implies that local parts are created by
the entity identified by the domain part of an address. This is not
necessarily the case, e.g. in the case of local part suffixes like
fanf2+smtp(_at_)cam(_dot_)ac(_dot_)uk which can often be created by the MUA 
without
reference to the recipient system. Perhaps a better phrasing would
be "presented to the recipient system".

----

3.1.1 / 3.1.2. Message submission.

The draft states that the Sender: is set by the MUA. It is often overriden
by the MSA to refer to the authenticated address of the sender, as a
protection against spoofing.

The draft omits to say that the MSA will extract the BCC: addresses
when it is creating the initial envelope recipient list, and remove the
BCC: field or leave it empty.

There should probably be some mention of other submission-time fix-ups,
like creation of the Message-ID: and Date: fields.

In practice the boundary between the MUA and MSA is more blurry than the
draft depicts. The description in the draft corresponds to the division
that is common when the MUA and MSA are different pieces of software
running on the same host (e.g. `sendmail -t` on Unix). However from the
point of view of Internet protocols (in particular the SUBMISSION
protocol), the MUA takes responsibility for creation of the initial
envelope and fix-up of the BCC: field.  The Message-ID: can be set by
either the MUA or the MSA depending on the phase of the moon and the
direction of the wind.

There also needs to be some discussion of the process of more unusual
submisssion scenarios, such as re-sending or mailing list dispatch. The
latter in particular requires the MUA to have direct control over the
message envelope as in SUBMISSION but not provided by `sendmail -t`.

(At least this end of the process isn't as complicated as the reception
end...)

----

4.1. Envelope

The description in the draft is rather different to the common meaning of
"envelope". The word is usually used to mean the message transmission
information that comes before DATA in the SMTP protocol. Some of it may
appear in the header (e.g. in the Received: trace if there's only one
recipient, or in Return-Path: after final delivery), but that's an
after-the-fact record of what was going on.

From the point of view of my layers the enverlope is a sublayer of e3,

created from the message header during submission and used for subsequent
transmission of the message to its destination.

----

5. Two levels of store-and-forward

I'm rather unhappy with this section, especially the title -- I'm not sure
if there are levels as such and I'm not sure if there are two of them. The
list of actions is fine, though I would present them in a different order.
I prefer to define the actions in terms of what is done rather than who is
doing it, because the MTA/MDA/MUA distinction gets very blurred -- e.g.
5.2.5 is titled "MUA alias handling" but goes on to talk about MDAs (which
is where it is usually implemented, though it may occur in the MTA or
MUA). This approach also escapes from the false MTA/MUA dichotomy.

There are two important things that may occur when a message is passed on:
its reverse path may change or not (an e3 alteration by my layering), and
its content may change or not (an e4 alteration). These two are
independent of each other.

My ordering is intended to be a smooth progression from relay-like to
reply-like (though it's a REALLY BIG STRETCH to say that a reply to a
message is a kind of forwarding action).

5.1 Relaying

No change of reverse path or message content.

Note that the envelope often does change when a message is relayed,
because subsets of the envelope recipient addresses may route to different
hosts. The draft says that relaying doesn't involve changing addresses in
the envelope which is correct, but a bit misleading.

5.2.5 Aliasing

No change of reverse path or message content.

This section should mention that this action is sometimes called
forwarding (after the unix .forward file) or redirecting (after the Sieve
action command that causes it).

This section's explanation of the reason for leaving the return path
unchanged could be improved. It mentions that an aliasing arrangement is
the responsibility of the owner of the original RCPT TO address, but
doesn't state that in the usual case there is no way to inform this person
of onward delivery problems. They set up the aliasing because they don't
read email delivered there (and may not even have a mailbox to keep it
in). Naively changing the MAIL FROM would cause errors to be lost.

The comment about of the use of aliasing for very basic mailing lists
should be moved here from the mailing lists section and cross-referenced.

5.2.4 Gatewaying

I think there are roughly two kinds of gateway, which should be kept
more distinct in this document.

Security gateways that do content filtering, but otherwise act like MTAs.
Often found policing the email of Internet edge networks. They leave the
addresses the same but change the content.

Gateways that translate Internet email into a technically different
messaging environment. These make syntactic changes that should try to
preserve semantics. I'm not sure that referring to parts of Internet
message standards in this context is helpful.

5.2.6 Mailing lists

Reverse path changed to list managment; message content may be changed for
list identification and/or security/policy purposes.

5.2.2 Re-sending

Reverse path changed to re-sender and Resent- headers added. Message
content not usually changed.

From the addressing point of view this is very similar to lists apart from

the lack of automation and the details of the conventional changes to the
message header.

Is it worth noting that Pine calls this "bouncing" (a catastrophic
malapropism)?

5.2.1 Forwarding

New message (therefore different reverse path) with content that includes
an encapsulated copy of the original message.

I don't think there are very many standards or conventions for the
handling of layer e4 identities in this situation (i.e. inclusion of the
Message-ID of the original message in the References: or In-Reply-To:
fields of the new one).

This section needs to refer to the message/rfc822 standard, and it should
mention that non-standard encapsulations are often used instead.

5.2.3 Replying

New message (therefore different reverse path) which includes fragments of
the original.

This section needs to mention the Re: convention for the Subject: and the
propagation of the original Message-ID: into the new References: and
In-Reply-To: fields.

----

Finally some trivia...

1.2. Discussion Venue.

Is it not worth mentioning <asrg(_at_)ietf(_dot_)org>?

2.2 Domain Names

"sub-names" should be "labels" perhaps?

Typoes: (section 3.1.4) "imitative" should be "initiative",
"MSA" should be "MDA".

-- 
Tony Finch  <dot(_at_)dotat(_dot_)at>  http://dotat.at/