[Top] [All Lists]

Re: RFC2821bis-02 Issue 27: Received clauses

2007-04-23 18:28:57

On 2007-04-23, Pete Resnick wrote:

And here I was thinking that I was going to get to duck all of this.

It may be worse than you think...

Some confusion in terminology (and some broken syntax in 2822) led to 
a short discussion resulting in the following:

- 2821bis and 2822upd will both define "Header Section" as the 
complete set of header fields at the beginning of a message.

"Message Header Section" might also avoid the ambiguity with the set(s)
of header fields which occur in composite MIME messages (RFC 2046

- 2821bis and 2822upd will both define "Header Field" as a line in a 
message with a field name followed by a colon followed by a field 

- 2821bis and 2822upd will both avoid using the terms "header" and 
"headers" in isolation to avoid ambiguity.

Minor nit above notwithstanding, this is an improvement in consistency
(e.g with FYI18) and will help reduce ambiguity.  Thanks.
- 2821bis will use the term "received clause" to refer to a keyword 
and the set of stuff that follows it before the next keyword (so, 
"for presnick(_at_)qualcomm(_dot_)com" is a received clause).

OK -- I suggest using an ABNF rule name for the construct of "clause"
(rather than -- or in addition to -- 2822 "name-val-pair") to make the
association between RFC text and ABNF clearer.

- 2822upd will simplify the syntax of received clauses (what is now 
"name-val-list") such that it's simply "*(word / domain / addr-spec / 
angle-addr)" and let 2821bis do the real definition.

Please don't forget the existing (2821/2822) discrepancies:
1. 2821 permits "String" which in turn includes "Quoted-string", while
   2822 does not allow "quoted-string" in a value-item [*]
2. 2821 specifies the "for" clause name as having a value of
   1 * ( Path / Mailbox ), whereas 2822 permits only a single
   addr-spec or one or more angle-addrs [technically, 2822 also permits
   an atom, a domain, or a msg-id] (821 permitted a single "path" and
   822 a single addr-spec); parsing multiple values attached to a name
   is quite difficult to do correctly and efficiently, and is
   unnecessary if multiple "for" clauses are permitted [822 permitted
   multiple "with" clauses, but not "for"; 821 permitted/required one
   instance each of "from" and "by", and at most one of "via", "with",
   "id", and "for"; 2822 permits any number of any named clauses;
   2821 requires single "from" and "by" (with different and non-*822
   syntax from 821) and at most one each of "with", "via", "id", and
   "for" clauses (the latter two also having different syntax from
   821 and *822)] [+] [&]
3. 2822 does not specify order of clauses; 822 and *821 specify order
   as "from" "by" "via" "with" id" "for".
4. 2822 does not associate a value type with specific names, and
   therefore permits atrocities such as
      Received: from <id(_at_)domain(_dot_)org> by <foo(_at_)bar(_dot_)com> via
        with [] id ESMTP for UUCP ; 1 Jan 2007 01:23:45 -0600
   whereas 822 and *821 specify (inconsistently, unfortunately) the
   specific value types for each specified permitted name
5. 2821's structured "TCP-Info" is syntactically equivalent to an *822
   comment ["ignored by the formal semantics" [822]/"semantically
   interpreted as a single space character" [2822]]

Also, please be very careful: a literal
  clause = *(word / domain / addr-spec / angle-addr)
would allow currently illegal atrocities as valid "clause"s such as
"with Microsoft SMTPSVC" [an error actually produced by certain
non-*82{1,2}-compliant software] and "from by with for id via";
both examples would be quite difficult to parse.  Sticking with the
historical and present specification of a name-value pair [ideally
with specified value type for each (registered!) name ] would be
preferable to introduction of new opportunities for interoperability
problems. [+] Retaining the requirement for a name-value pair could
be done as simply as:
  clause = name-value-pair CFWS
with name-val-pair and the implicit item-name and item-value much as
in 2822 [again, ideally with complete specification of value type as
in 822].

Although it may be tempting to specify a generic
field in 2822bis and punt the specifics to 2821bis, failing a MUST/
MUST NOT specification that states in effect that *only* SMTP receivers
are permitted to insert Received fields [#], that leaves the possibility
that some non-SMTP agent (and therefore not subject to 2821bis'
specification) will insert such a field having syntax incompatible
with the 2821bis specification; as a mail user agent or other agent
which may have to parse such fields is unable to determine whether a
given instance was inserted by an SMTP receiver (and subject to 2821bis
syntax) or by some other agent (with only a generic 2822bis
specification), it may be (depending on how loose the 2822bis
specification is) impossible in general to parse such fields.  2822 as
written hasn't quite fallen over that cliff, but it's teetering on the
  Received:;1 Jan 2007 02:34:56 -0700
is a legal 2822 Received field (with no substantive trace information).
And the "from by with for id via" atrocity above *is* legal 2822 if
parsed as three name-val-pairs ("by", "for" and "via" being legal
2822 "atom"s associated as values with "from", "with", and "id" names).
[The careful reader may note that that illustrates the precise problem
with non-paired "clause" constructs as well as with permitting multiple
values with a single name -- one cannot tell which tokens to group

Another issue worth mentioning w.r.t. Received fields is the special
treatment required by RFC 2047, viz. comments in Received fields are
forbidden from incorporating encoded-words (for no documented reason),
which are perfectly legal in all other structured fields which permit
comments. Therefore:
  Date: 1 Jan 2007 02:34:56 +0100
is perfectly legal (and somewhat informative), whereas
  Received:;1 Jan 2007 02:34:56 +0100
is quasi-legal (if and only if one considers the comment as not
containing an encoded-word, in which case it is uninformative gibberish
[more so had I used B encoding in the example rather than (for legibility)
Q encoding])

* although ABNF rule names are de jure case-insensitive, another potential
  source of confusion and ambiguity can be removed by making references to
  common ABNF rule names case-consistent between 2821bis and 2822bis, e.g.
  all lower-case "quoted-string"

+ given the historical and continued inconsistencies with the specified
  syntax of Received fields, as well as [clearly broken, though widely
  deployed] implementations which fail to conform to *any* of those
  specifications, and the unsuitability of the fields for the purpose of
  *reliably* tracing messages [$], it might be worthwhile to define a
  replacement successor to the Received field (which itself replaced RFC
  788's Mail-From field) in order to provide a clean slate

# I hope that the Security Considerations sections of both documents will
  acknowledge the fact that trace fields by their nature preclude
  end-to-end message (including message header section) integrity/privacy
  mechanisms. I.e. by modifying the message in transit (specifically by
  inserting new message header fields), any message integrity checksum
  over the entire message is rendered invalid.  It would be nice to do
  away with that problem, but doing so would require substantive changes
  to both specifications.

$ viz. the trivially-forged and therefore unreliable HELO/EHLO name is
  recorded in the machine-readable "from" clause, whereas the reliable
  client IP address is (optionally!) relegated to a "semantically
  interpreted as a single space character" [quoting 2822] comment.
  This is a security-related issue with the *821 specification of
  Received fields (forgery may cause blame (for SPAM, copyright
  infringement, so-called "hate crimes", etc.) to be wrongly attributed).
  Note that RFC 788 did not specify use of the easily-forged HELO name
  in its Mail-From field.

& both documents' Security Considerations sections should note that
  "for" clauses are likely to compromise privacy and may compromise
  security by revealing information which should be confined to non-
  message "envelope" (i.e. [E]SMTP command parameters) data (the
  poster child case is Bcc recipient data revealed via [possibly
  multiple] "for" value angle-addrs)