comment on draft-ietf-marid-core-00: PRA algorithm (long)



I'd been slightly puzzled by one aspect of the CID PRA algorithm when
I first read the document, but I wasn't aware of any suitable forum in
which to discuss the CID draft.

Now that a very similar algorithm lives on in marid-core, I'll raise
the issue here.

Refer to the algorithm in marid-core, section 4.

Step (2) involves taking the first mailbox in the Resent-From header.

Step (5) involves taking the first mailbox in the From header.

I'll address step (5) first.  If we've got here, we have no Sender
header, so having multiple addresses in the From field means that the
message is malformed (both according to RFC822 and RFC2822).  What is
the rationale for taking the first address in the From field of a
malformed message, rather than just rejecting it as "hopelessly
malformed".  Is this kind of malformed message prevalent in the wild?

Furthermore, what if the message contains multiple From headers?  This
is invalid, but in the absence of a Sender header, the PRA algorithm
will select the first From header and ignore the rest.  This strikes
me as bad, since it's possible that some MUAs might instead select the
last From header for display.

A similar thing can be said about step (2), namely that if we reach
this step then we don't have a corresponding Resent-Sender header, so
multiple mailboxes in the Resent-From renders the message malformed.
I suppose it's just conceivable though the headers might be in a
strange order and hence confuse step (1) into disregarding a relevent
Resent-Sender header.

The issue here is that RFC822 section 4.1 states that the header
fields are not required to appear in any particular order, although it
does say (section 4.2) that the interpretation of multiple Resent-*
fields of the same type is undefined.  So an RFC822 MUA could, if it
wanted to be really obtuse, prepend a Resent-From to the existing
headers, and *append* a Resent-Sender.  (Hopefully it would remove any
preexisting Resent-* headers, though, to avoid emiting a message whose
semantics are undefined.)  Step (1) would then (incorrectly) select
the Resent-From (which might contain multiple mailboxes) rather than
the Resent-Sender, always presuming there is one or more intervening
Received header.  This seems an unlikely scenario to me in practice,
though.

With RFC2822 MUAs, the situation should be somewhat better, since the
syntax (section 3.6) constrains the order of the headers.  I'm a
little concerned that 3.6 only says that trace headers SHOULD be kept
in blocks prepended to the message.  Given the syntax of the "fields"
production, I'm not quite sure what the authors were envisaging an MUA
might do that would violate the SHOULD but still conform to the
syntax.

Still, I'd find it highly surprising if any real world mailer that is
conformant to either RFC822 or RFC2822 would result in step 2 being
reached with multiple mailboxes in the Resent-From header.  Am I
mistaken in this belief?

And whether or not such message occur in the wild, is it really
helpful to guess that the first mailbox is the right place to apply
MARID checks in the case of malformed messages or in the case of
really bizarre Resent-* corner cases?  Or does it make more sense for
the PRA algorithm simply to fail to extract an address, and treat the
message as "hopelessly malformed" as it would if the message lacked
any origniator headers?

Also, section 4 says:

   The purported responsible address of a message is determined by the 
   first from the following list of items that is present, non-empty, 
   and is a syntactically valid e-mail address: 
    
This means that if I place a syntactically invalid address in the
Sender header, the PRA algorithm will ignore it and use the From
header for MARID checks.  What if my MUA fails to notice that the
address is invalid, and displays it as the purported sender address,
despite the MARID checks having been done on the From header instead?

As a strawman, here is an alternative which addresses my concerns.  It
is intentionally a minimal set of modifications to the current
algorithm, and is intended to return the same result as the existing
algorithm in all but the corner cases.  I'm sure it can be made
cleared and less wordy.

4. Determining the Purported Responsible Address 

   The purported responsible address (PRA) of a message is determined
   by the following algorithm:
    
   1.  Locate the first non-empty Resent-Sender header in the message.
       If no such header is found, proceed to step 2.  If it is
       preceded by a non-empty Resent-From header and one or more
       Received or Return-Path headers occur after said Resent-From
       header and before the Resent-Sender header, proceed to step
       2. If the Resent-Sender header is hopelessly malformed (eg it
       appears to contain multiple mailboxes, or if the single mailbox
       is hopelessly malformed) then exit, without returning a PRA.
       Otherwise exit, returning the mailbox from the Resent-Sender
       header as the PRA.
 
   2.  Locate the first non-empty Resent-From header in the message.
       If no such header is found, proceed to step 3.  If it is
       hopelessly malformed (eg one or more mailboxes in the header
       are hopelessly malformed) then exit without returning a PRA.
       If it contains multiple mailboxes, then exit without returning
       a PRA.  Otherwise exit, returning the single mailbox from the
       Resent-From header as the PRA.
 
   3.  Locate the first non-empty Delivered-To, X-Envelope-To or
       Envelope-To header in the message.  If no such header is
       present, proceed to step 4.  If it appears to contain multiple
       mailboxes, then exit without returning a PRA.  Otherwise, treat
       the contents of the header as a mailbox, and exit returning
       this mailbox as the PRA (unless the mailbox is hopelessly
       malformed, in which case exit without returning a PRA).

       [Note.  This means that a non-standard header that does *not*
       contain a single valid mailbox will cause the PRA algorithm to
       fail and may cause the message to be rejected.  But anything
       else means that the process doing the MARID checks might make a
       different decision as to the validity of the mailbox from a
       subsequent MUA which attempts to display the purported
       responsible address by parsing the headers.  Maybe it would be
       better to drop this step, and go back to only considering
       RFC(2)822 headers?]
  
   4.  Locate all the non-empty Sender headers in the message.  If
       there are no such headers, continue to step 5.  If there are
       multiple such headers, exit without returning a PRA.  If the
       single non-empty Sender header is hopelessly malformed (eg if
       it appears to contain multiple mailboxes, or if the single
       mailbox is hopelessly malformed), exit without returning a PRA.
       Otherwise, exit returning the mailbox from the Sender header as
       the PRA.
 
   5.  Locate all the non-empty From headers in the message.  If there
       are no such headers, or multiple such headers, exit without
       returning a PRA.  If the single non-empty From header is
       hopelessly malformed (eg it contains one or more mailboxes that
       are hopelessly malformed) then exit without returning a PRA.
       If it contains multiple mailboxes, exit without returning a
       PRA.  Otherwise, return the single mailbox from the From header
       as the PRA.
 
   The purported responsible domain of the message is the domain part
   of the purported responsible address returned by the above
   algorithm.
    
   If the above algorithm fails to return a PRA, or if the PRA does
   not contain a domain, then an MTA SHOULD reject the message as
   hopelessly malformed.

   What constitues a hopelessly malformed header or a hopelessly
   malformed mailbox is a matter for local policy.

   [Note: such local policy will never cause two implementation to
   return different PRAs.  However it may cause one implementation to
   return a PRA where another implementation does not.  The result is
   that corner cases may result in messages of questionable
   deliverability, but they will never result in an MTA doing MARID
   checks on a different PRA from the address that a PRA-aware MUA
   chooses to display, even if they make their decisions
   independently.]

        -roy