I'd been slightly puzzled by one aspect of the CID PRA algorithm when
I first read the document, but I wasn't aware of any suitable forum in
which to discuss the CID draft.
Now that a very similar algorithm lives on in marid-core, I'll raise
the issue here.
Refer to the algorithm in marid-core, section 4.
Step (2) involves taking the first mailbox in the Resent-From header.
Step (5) involves taking the first mailbox in the From header.
I'll address step (5) first. If we've got here, we have no Sender
header, so having multiple addresses in the From field means that the
message is malformed (both according to RFC822 and RFC2822). What is
the rationale for taking the first address in the From field of a
malformed message, rather than just rejecting it as "hopelessly
malformed". Is this kind of malformed message prevalent in the wild?
Furthermore, what if the message contains multiple From headers? This
is invalid, but in the absence of a Sender header, the PRA algorithm
will select the first From header and ignore the rest. This strikes
me as bad, since it's possible that some MUAs might instead select the
last From header for display.
A similar thing can be said about step (2), namely that if we reach
this step then we don't have a corresponding Resent-Sender header, so
multiple mailboxes in the Resent-From renders the message malformed.
I suppose it's just conceivable though the headers might be in a
strange order and hence confuse step (1) into disregarding a relevent
Resent-Sender header.
The issue here is that RFC822 section 4.1 states that the header
fields are not required to appear in any particular order, although it
does say (section 4.2) that the interpretation of multiple Resent-*
fields of the same type is undefined. So an RFC822 MUA could, if it
wanted to be really obtuse, prepend a Resent-From to the existing
headers, and *append* a Resent-Sender. (Hopefully it would remove any
preexisting Resent-* headers, though, to avoid emiting a message whose
semantics are undefined.) Step (1) would then (incorrectly) select
the Resent-From (which might contain multiple mailboxes) rather than
the Resent-Sender, always presuming there is one or more intervening
Received header. This seems an unlikely scenario to me in practice,
though.
With RFC2822 MUAs, the situation should be somewhat better, since the
syntax (section 3.6) constrains the order of the headers. I'm a
little concerned that 3.6 only says that trace headers SHOULD be kept
in blocks prepended to the message. Given the syntax of the "fields"
production, I'm not quite sure what the authors were envisaging an MUA
might do that would violate the SHOULD but still conform to the
syntax.
Still, I'd find it highly surprising if any real world mailer that is
conformant to either RFC822 or RFC2822 would result in step 2 being
reached with multiple mailboxes in the Resent-From header. Am I
mistaken in this belief?
And whether or not such message occur in the wild, is it really
helpful to guess that the first mailbox is the right place to apply
MARID checks in the case of malformed messages or in the case of
really bizarre Resent-* corner cases? Or does it make more sense for
the PRA algorithm simply to fail to extract an address, and treat the
message as "hopelessly malformed" as it would if the message lacked
any origniator headers?
Also, section 4 says:
The purported responsible address of a message is determined by the
first from the following list of items that is present, non-empty,
and is a syntactically valid e-mail address:
This means that if I place a syntactically invalid address in the
Sender header, the PRA algorithm will ignore it and use the From
header for MARID checks. What if my MUA fails to notice that the
address is invalid, and displays it as the purported sender address,
despite the MARID checks having been done on the From header instead?
As a strawman, here is an alternative which addresses my concerns. It
is intentionally a minimal set of modifications to the current
algorithm, and is intended to return the same result as the existing
algorithm in all but the corner cases. I'm sure it can be made
cleared and less wordy.
4. Determining the Purported Responsible Address
The purported responsible address (PRA) of a message is determined
by the following algorithm:
1. Locate the first non-empty Resent-Sender header in the message.
If no such header is found, proceed to step 2. If it is
preceded by a non-empty Resent-From header and one or more
Received or Return-Path headers occur after said Resent-From
header and before the Resent-Sender header, proceed to step
2. If the Resent-Sender header is hopelessly malformed (eg it
appears to contain multiple mailboxes, or if the single mailbox
is hopelessly malformed) then exit, without returning a PRA.
Otherwise exit, returning the mailbox from the Resent-Sender
header as the PRA.
2. Locate the first non-empty Resent-From header in the message.
If no such header is found, proceed to step 3. If it is
hopelessly malformed (eg one or more mailboxes in the header
are hopelessly malformed) then exit without returning a PRA.
If it contains multiple mailboxes, then exit without returning
a PRA. Otherwise exit, returning the single mailbox from the
Resent-From header as the PRA.
3. Locate the first non-empty Delivered-To, X-Envelope-To or
Envelope-To header in the message. If no such header is
present, proceed to step 4. If it appears to contain multiple
mailboxes, then exit without returning a PRA. Otherwise, treat
the contents of the header as a mailbox, and exit returning
this mailbox as the PRA (unless the mailbox is hopelessly
malformed, in which case exit without returning a PRA).
[Note. This means that a non-standard header that does *not*
contain a single valid mailbox will cause the PRA algorithm to
fail and may cause the message to be rejected. But anything
else means that the process doing the MARID checks might make a
different decision as to the validity of the mailbox from a
subsequent MUA which attempts to display the purported
responsible address by parsing the headers. Maybe it would be
better to drop this step, and go back to only considering
RFC(2)822 headers?]
4. Locate all the non-empty Sender headers in the message. If
there are no such headers, continue to step 5. If there are
multiple such headers, exit without returning a PRA. If the
single non-empty Sender header is hopelessly malformed (eg if
it appears to contain multiple mailboxes, or if the single
mailbox is hopelessly malformed), exit without returning a PRA.
Otherwise, exit returning the mailbox from the Sender header as
the PRA.
5. Locate all the non-empty From headers in the message. If there
are no such headers, or multiple such headers, exit without
returning a PRA. If the single non-empty From header is
hopelessly malformed (eg it contains one or more mailboxes that
are hopelessly malformed) then exit without returning a PRA.
If it contains multiple mailboxes, exit without returning a
PRA. Otherwise, return the single mailbox from the From header
as the PRA.
The purported responsible domain of the message is the domain part
of the purported responsible address returned by the above
algorithm.
If the above algorithm fails to return a PRA, or if the PRA does
not contain a domain, then an MTA SHOULD reject the message as
hopelessly malformed.
What constitues a hopelessly malformed header or a hopelessly
malformed mailbox is a matter for local policy.
[Note: such local policy will never cause two implementation to
return different PRAs. However it may cause one implementation to
return a PRA where another implementation does not. The result is
that corner cases may result in messages of questionable
deliverability, but they will never result in an MTA doing MARID
checks on a different PRA from the address that a PRA-aware MUA
chooses to display, even if they make their decisions
independently.]
-roy