ietf-822
[Top] [All Lists]

Received header Considered Pathetic (was Re: Revisiting RFC 2822 grammar (scratching the surface of Received issues))

2004-01-16 07:55:39

Bruce has raised an issue I've been thinking about a lot recently. The "Received" header is woefully inadequate for spam tracing, and has a history of syntactic cruft that will be hard to simplify. I would like to see us design a replacement that is specifically designed to support rapid automated spam tracing. I think that a combination of an easily-parsed Received header with a well designed Mail Source Tracing Protocol (by which you could automate certain queries to the various nodes a message had passed through) would be an extremely useful tool -- not a silver bullet, of course -- in the fight against spam, and in particular in efforts to enforce antispam laws. Is anyone interested in working on a spec for this? -- Nathaniel


On Friday, January 16, 2004, at 06:30  AM, Bruce Lilly wrote:


Pete Resnick wrote:


On 1/15/04 at 7:09 PM +0100, Arnt Gulbrandsen wrote:

5. obs-received merits discussion on its own. RFC 2822 says

  obs-received = "Received" *WSP ":" name-val-list CRLF

which Bruce changes to

obs-received = "Received" *WSP ":" [CFWS] name-val-list [ ";" [CFWS] obs-date-time ] CRLF

An incompatible change, but perhaps correct.


Yes, I think Bruce fixes a bug in 2822 here.

Received has issues, both w.r.t. 2822 and 2821.

First an historical overview: Received was first specified in RFC 821, also known as a "time stamp line". There have been, and still are, discrepancies between the 821/822 and 2821/2822 definitions of the field body. Received is one of the trace fields, and as noted in RFC 1123, was initially primarily examined by hand. 1123 permitted adding information actually useful for tracing (i.e. the peer IP address, as opposed to the HELO string which is far too easily forged), however gave no syntax for doing so [common practice has been to
include it in a comment].

Nowadays, tracing Received fields by hand is far too labor-intensive. Unfortunately, due to the exponential growth of spam, it is necessary. We now have the unfortunate situation where the required information (at least in 2821) includes the easily-forged HELO/EHLO string, and the useful not-so-easily-forged connection information is relegated to an optional construct. Worse, in 2821 that optional construct is specified as some sort of structured
comment, indeed it is indistinguishable from a comment.

What is really desirable is something that has reliable trace information in machine-readable form, perhaps with the easily-forged information relegated to comments. But that's another
discussion...

Prior to 2822, a time stamp line (a.k.a. Received field) always required a time stamp. 2822 permits a time stamp line w/o a time stamp, which is an oxymoron. I don't personally like that (IMO the time stamp should be mandatory), but that's what 2822 says, and that's part
of what is included in the syntax above.

Another issue, not included above, is an incompatibility which has crept in. 821/822 required SP before the semicolon which delimits the start of the date-time. However, most MTAs incorrectly omit that space [in some, such as sendmail, that can be easily rectified via a run- time configuration patch; others, such as qmail, have that error hard-coded]. 2821 requires CFWS immediately before the semicolon, but 2822 makes it optional. Given 821/822/2821's requirements, I'd be inclined to revise 2822 to make at least CFWS mandatory before the semicolon when generating a message, and given past (clearly wrong, but quite widespread) practice, I'd require being able to parse a Received field w/o CFWS before the semicolon
(via obs- syntax).

One more remaining incompatibility between 2821 and 2822 lies in the permitted constructs; 2821 permits a quoted string as an item value (via 2821's "String") whereas 2822 has no such provision. That shows up in the "id" component, which has a long history of conflicts between 821/822 (1123 tried to rectify the conflict, but only added more confusion).

Yet another incompatibility is that 2821 permits a mix of angle-addrs (a.k.a. Paths) and addr-specs (a.k.a. Mailboxes) in a "for" component, whereas 2822 permits a single addr-spec or multiple angle-addrs (and no mixture). It turns out that for rather complicated reasons, the 2821 provision for multiple addr-specs is rather difficult to parse. Perhaps 2821's successor should address that issue; in any event, let's at least remove the remaining conflicts
between the 2821 and 2822 definitions one way or another.

I note also that there exist broken implementations which generate cruft that cannot be parsed even with 2822's exceptionally liberal rules. Here are some real-world examples:

Received: from web197.nyc01.cbsig.net ([63.240.56.197])
        by mx08.mrf.mail.rcn.net with smtp (Exim 3.35 #7)
        id 1Af3Xn-0005vx-00
        for blilly(_at_)erols(_dot_)com; Fri, 09 Jan 2004 15:47:47 -0500
Received: (qmail 28572 invoked from network); 9 Jan 2004 20:47:13 -0000
Received: from nychubg02.cbs.com (170.20.9.151)
 by web197 with SMTP; 9 Jan 2004 20:47:13 -0000
Received: by nychubg02.cbs.com with Internet Mail Service (5.5.2656.59)
        id <ZC7JJYV9>; Fri, 9 Jan 2004 15:41:17 -0500

That's one recent example; among the problems:
as noted above, the SP/CFWS-before-semicolon issue
id's other than properly-constructed msg-ids
RFC 821 does not permit day-of-week in the time stamp
missing from and/or by components in some cases
illegal (non-RFC 1700 cruft) in "with" components
there is no defined "Mail" item-name

Received: from panic.noceast.dws.disney.com (panic.corp.disney.com [153.6.248.200]) by mail.disney.com (Switch-3.1.2/Switch-3.1.0) with ESMTP id h9NCwuN4022589
        for <blilly(_at_)erols(_dot_)com>; Thu, 23 Oct 2003 05:58:57 -0700 (PDT)
Received: from sm-flor-xc03.wdw.disney.com (sm-flor-xc03.wdw.disney.com [172.16.177.30]) by panic.noceast.dws.disney.com with ESMTP; Thu, 23 Oct 2003 08:55:03 -0400 Received: from sm-flor-xc01.wdw.disney.com ([172.16.177.21]) by sm-flor-xc03.wdw.disney.com with Microsoft SMTPSVC(5.0.2195.5329);
         Thu, 23 Oct 2003 08:59:43 -0400
Received: from SM-NYNY-XC01.nena.wdpr.disney.com ([167.13.137.76]) by sm-flor-xc01.wdw.disney.com with Microsoft SMTPSVC(5.0.2195.5329);
         Thu, 23 Oct 2003 08:59:42 -0400
Received: from sm-nyny-xm05.nena.wdpr.disney.com ([167.13.137.80]) by SM-NYNY-XC01.nena.wdpr.disney.com with Microsoft SMTPSVC(5.0.2195.6713);
         Thu, 23 Oct 2003 08:59:41 -0400


That's even worse; additional problem is that parsing fails after the (illegal) with component on encountering a lone "SMTPSVC" (and Microsoft was informed about that bug in Windows 2000 well befor SP1; 3 service packs and as many years later and the bug still hasn't been fixed (it shouldn't take more than 10 seconds for a competent programmer to modify the source to a) use a legal value (ESMtp or SMTP) in the with component, or b) elide the optional with component, or c) put the marketing BS in a
comment)...

For the record, I am NOT in favor of extending the syntax to accept such cruft -- I wish
that certain purveyors of brokenware would clean up their acts.



#################################################################
#################################################################
#################################################################
#####
#####
#####
#################################################################
#################################################################
#################################################################




<Prev in Thread] Current Thread [Next in Thread>