Re: more comments on draft-crocker-email-arch-00


Tony,

good comments.   here are some good responses to them...


TF> e2) transport layer i.e. SMTP.

TF> Other implementations include LMTP, batch SMTP, UUCP, and way back in the
TF> ARPANET days, FTP.

Rather than being a general discussion of global email, I decided to
limit references to current Internet specifications.  Hence, the FTP
MAIL and MLFL commands have died away and UUCP was never part of the
set.  That does not make them any the less important, merely not
within scope for a current description of the main stream of Internet
mail.



TF> Originally these were host-based source-routed addresses, e.g.
TF> <@smtp.hermes.cam.ac.uk:dcrocker(_at_)brandenburg(_dot_)com> or

Actually, those came later. The original Arpanet mail was user(_at_)host,
with <host> specifying an entry in a single, flat name space and
mapping directly to a "globally" accessible Arpanet address.

There were source-routing hacks done unofficially, such as bang for
UUCP (early) and percent for CSNET (later) but it was not until RFC733
or RFC822 (I forget which) that source routing had an official syntax.

It is worth noting that source routing has consistently shown poor
scaling characteristics and has never been a meaningful part of
Internet mail.  The UUCP experiment is certainly noteworthy, both for
its success and its limits.  But it was not replicated in the Internet
suite.


TF> e4) message content

TF> Originally email was just bare ASCII, but now we have MIME. As well as
TF> MIME this layer includes most of the RFC 822 header.

There is certainly a legitimate argument in favor of counting the body
and the RFC2822 headers together, since the meat of a message's
semantics well might be in some of those headers.

However the usual view is that the headers have meta-content, whereas
the body has content.  This distinction is exacerbated -- I use the
word since all of this is consistently confusing -- by the presence
of header information that is really part of the handling envelope
rather than the user payload.  sigh.


TF> An odd result of the way I have divided the layers is that part of an
TF> addressing field is in the content layer (the display-name which may be
TF> MIME-encoded) and part is in the address layer (er, the address itself).

Not really.  The transport-level use does not have a display-name
field.  The RFC2822-level is really the end-user content information
(both display and address), rather than being use by transport.


TF> A tempting comparison is between email address aliasing (as in the Sieve
TF> redirect action) and NAT.

Not really.  NAT is about content-rewriting.  Aliasing is about simple
message forwarding, without altering the content of the message.


TF>  Many people in the MARID camp claim that email
TF> aliasing is evil bad and wrong because it breaks the assumption that the
TF> SMTP originator information must correspond to the email address
TF> originator information.

I am not aware of that assumption ever being present in Internet mail.

From very nearly the earliest days, mailing lists have been a popular

mechanism, leading to a different address in the TO field than in the
transport envelope.


TF> Identities.

TF> I'm not very happy with the concept of "identities" used in the draft. It
TF> bundles together layer e2 and layer e3 information

Having a discussion of identities, as separate from protocol layers,
was intentional. An address is the same, no matter the level in which
it is used.


TF>  -- there are big tables
TF> about "setting" identities, some of which settings only last for the
TF> lifetime of a connection and some of which for the lifetime of a message.
TF> Is a Received: trace field really an identity that is set? I would say
TF> that the EHLO domain is "stated" rather than "set".

Well, no, not really. The same host may (legitimately) assert
different identities for different sessions, so this is very much a
case of "setting".


TF> The draft also omits almost all consideration of the layer e4 identities
TF> apart from a brief mention of List-*. It would be worth including the
TF> Message-ID: of a message and how it refers to previous messages using the
TF> References: and In-Reply-To: fields.

Message Identifiers.  As a separate point of discussion?

Hmmmm.  Probably a good idea.


TF> A quick note on 2.1 Mailbox Addresses:
TF> "returned to its originator" implies that local parts are created by
TF> the entity identified by the domain part of an address.

Formally, it is the owner of the domain specified in the right-hand
that defines what is legal on the left-hand side of addresses
associated with that domain. And it is the owner of the domain who
assigns the "mailbox" portion of the left-hand side. So yes, they DO
create the string. The MUA might put it into From field, but the
domain owner defined (created) it.

The case of sub-addressing is interesting, primarily because it is
both common and non-standard.  So it might be worth mentioning as a
construct, but we cannot say anything about "standard" practise,
without devolving into a current practises document, rather than an
architecture document.  (And, yes, a current practises document would
be good to have; it's just not the goal of the current paper.)


TF> 3.1.1 / 3.1.2. Message submission.
TF> The draft states that the Sender: is set by the MUA.
TF> It is often overriden
TF> by the MSA to refer to the authenticated address of the sender, as a
TF> protection against spoofing.

This is a point worthy of some debate, I suspect.  Certainly an entity
later in a sequence can always override the work of an entity earlier
in the sequence.  Whether that is its architectural job is another
matter.


TF> The draft omits to say that the MSA will extract the BCC: addresses
TF> when it is creating the initial envelope recipient list, and remove the
TF> BCC: field or leave it empty.

good point.



TF> There should probably be some mention of other submission-time fix-ups,
TF> like creation of the Message-ID: and Date: fields.

The danger is getting caught talking about implementation rather than
architecture.  Still, something along these lines is worth adding.


TF> In practice the boundary between the MUA and MSA is more blurry than the
TF> draft depicts.

yup.


TF> 4.1. Envelope

TF> The description in the draft is rather different to the common meaning of
TF> "envelope". The word is usually used to mean the message transmission
TF> information that comes before DATA in the SMTP protocol.

I was rather pointedly trying to have the construct mean "handling
information" as it does more universally, rather than tieing it to
SMTP. As you note, the Received headers are an example. They are
strictly part of the MTA envelope (handling trace) world, even though
they are placed in an RFC2822 header.


TF>  Some of it may
TF> appear in the header (e.g. in the Received: trace if there's only one
TF> recipient, or in Return-Path: after final delivery), but that's an
TF> after-the-fact record of what was going on.

Received headers never appear in the SMTP protocol.


TF> 5. Two levels of store-and-forward

TF> I'm rather unhappy with this section, especially the title -- I'm not sure
TF> if there are levels as such and I'm not sure if there are two of them. The
TF> list of actions is fine, though I would present them in a different order.
TF> I prefer to define the actions in terms of what is done rather than who is
TF> doing it, because the MTA/MDA/MUA distinction gets very blurred

The purpose of this section is to highlight the confusion that has
persisted and define a resolution to it. Over the years, I've tended
to be pretty loose about considering mailing lists to be part of the
transfer system, versus being a user-level process. Discussions this
year have eliminated my own sense of ambiguity about this. In
particular, discussions trying to resolve who the "responsible"
posting Sender is have made the architectural issue crystal clear to
me.


As for the number of levels, two is a good place to start. We can
always start a recursive sequence with the upper layer, should it
prove necessary. At the moment, I'm far more concerned about getting
us all to clearly distinguish between basic transfer to a listed
recipient, versus any higher-level process that might choose to
forward it on, on its own authority.


TF> -- e.g.
TF> 5.2.5 is titled "MUA alias handling" but goes on to talk about MDAs (which
TF> is where it is usually implemented, though it may occur in the MTA or
TF> MUA).

"May occur in the MTA or MUA" strikes me as a pretty good definition
of MDA...


TF> This approach also escapes from the false MTA/MUA dichotomy.

and that false dichotomy would be what, exactly?


TF> There are two important things that may occur when a message is passed on:
TF> its reverse path may change or not (an e3 alteration by my layering), and
TF> its content may change or not (an e4 alteration). These two are
TF> independent of each other.

Actually, almost anything might occur, each being important.  That's
rather the important aspect of higher-level forwarding.  A new entity
is taking responsibility for the message -- including responsibility
for its contents -- and well might change any aspect.

Anyhow, that's what drove me to a table for each function.


TF> I think there are roughly two kinds of gateway, which should be kept
TF> more distinct in this document.
TF> Security gateways that do content filtering, but otherwise act like MTAs.

Those are called firewalls.  Real gateways are about translation.


TF> Gateways that translate Internet email into a technically different
TF> messaging environment. These make syntactic changes that should try to
TF> preserve semantics. I'm not sure that referring to parts of Internet
TF> message standards in this context is helpful.

The failure to acknowledge and deal with gateways as an architectural
construct has been persistent in networking, especially including
email. That said, Internet mail did the comprehensive work on
gatewaying with X.400 and, more recently, gatewaying between fax and
email.

Gateways are a fact of life and they are likely to stay that way.  It
is essential to deal with that fact realistically, neither ignoring
them nor letting them dictate the core service being specified.


TF> 5.2.3 Replying
TF> This section needs to mention the Re: convention for the Subject: and the
TF> propagation of the original Message-ID: into the new References: and
TF> In-Reply-To: fields.

By and large, I was not trying to document the myriad of common
practises outside the standards.  Instead the idea was to use the
stuff that has been formally documented as the basis for developing
architectural constructs.


TF> 1.2. Discussion Venue.
TF> Is it not worth mentioning <asrg(_at_)ietf(_dot_)org>?

I don't think so. That group is not tasked with -- or likely to be --
changing Internet mail architecture. Again, I was not trying to
provide reportorial thoroughness of the day's email activities, but
merely cite reasonable "strategic" venues to consider.


TF> 2.2 Domain Names
TF> "sub-names" should be "labels" perhaps?

I don't understand.


Again, thanks for the detailed review.

d/
--
 Dave Crocker <mailto:dcrocker(_at_)brandenburg(_dot_)com>
 Brandenburg InternetWorking <http://www.brandenburg.com>
 Sunnyvale, CA  USA <tel:+1.408.246.8253>, <fax:+1.866.358.5301>