On Wed October 20 2004 23:05, Laird Breyer wrote:
On Oct 20 2004, Justin Mason wrote:
As you said previously, the Received header is always prepended. This can
be exploited more simply for the Processed header, as long as the
Processed header is *also* prepended *only*, and never appended, by each
Just a quick explanation on why quoting text is perhaps a good idea.
If each piece of software which writes a Processed header always puts
it at the top, then you're right it's enough to look for the next
Received header down to discover where the Processed header was added,
so there's no need to quote ID or datestamp.
The ID or datestamp quoting described is intended to cope with
Processed headers which are either added somewhere else (e.g. many
filters add headers at the end of the list - sometimes it's simply
easier to add at the top, sometimes it's simply easier to add at the
bottom), and also to cope with possible third party rearranging of
headers (because header order is only guaranteed for Received headers
A fundamental problem is that the Internet mail model does
not permit adding arbitrary fields "at the top", "at the bottom",
or anywhere else. It provides only for addition or modification
of specific fields by Submission agents, and for time stamp
fields (a.k.a. Received fields) by SMTP receivers, and for a
Return-Path field when final delivery is performed. There is no
provision for any other addition, and provision only for removal
of Return-Path fields by an SMTP receiver. Otherwise,
message transport via SMTP is intentionally transparent.
Incidentally, RFC 2822 requires that order of trace (i.e.
Received and Return-Path) and Resent- fields be preserved;
"only guaranteed for Received" is incorrect.
Once a message leaves the SMTP transport environment, of
course there can be additional processing. However, if the
message content (header or body) is changed, then one
is no longer dealing with exactly the *same* message.
There is an existing mechanism that can be used to allow
addition of arbitrary content, with authentication, and
while permitting the original message to be recovered
intact: simply wrap the original as MIME type message/rfc822,
if there is additional content, it and the wrapped message
can be packaged as multipart/mixed, and if authentication
is desired that can then be signed using the multipart/signed
MIME type (S/MIME and PGP/MIME use that mechanism and
are widely deployed).
Such additional content need not be formatted as message
header fields; it could be any media type, including plain
text, some XML variant, etc. In particular, it need not
inflict on parsers the need to be able to handle the
complexities associated with RFC 2231 parameters.
identifying date-time: unfold the Received line, normalize spaces,
look for the first ';' from the end and take everything after that.
Received: (a comment; blah blah) ....
will not be handled properly by that simplistic algorithm. Nor
will a field with a semicolon anywhere else (e.g. in the identifier).
identifying ID: unfold the Received line, take the first string after
the keyword "id" if it exists (if you want to be more correct, remove
comments (in parentheses),
Received: from from . by . id by example . com ...
will mistake the (valid!) top-level domain name component "id"
for the "id" attribute name.
then look for the keywords from, by and
pick the first time you see " id " after that.
No, still NG for the simple example above.
Or you can properly
parse the line according to RFC 2822 and try to figure out the id.
There are several problems with that:
1. Received fields may have been generated in accordance with
RFC 2821 (indeed, that is likely the case, since it is SMTP
receivers which are supposed to generate the field), and there
are conflicts between RFCs 2821 and 2822.
2. Received fields may have been generated in accordance with
RFC 821, and there are conflicts between RFCs 821 and 822.
3. Some broken software generates fields which cannot be parsed
even with the very liberal RFC 2822 parse syntax, e.g.
Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC;
Sun, 15 Aug 2004 23:38:16 -0700
4. The id component is optional; it might not be present at all.
For that matter, RFC 2822 has relaxed the syntax to the oxymoronic
point that the time stamp need not be present in the time stamp
is a perfectly legal complete Received field per RFC 2822 parse syntax.
By the way, there's an even easier way to quote a piece of the
Received line for use within a Processed header field:
unfold the Received line, normalize spaces, then compute a hash value
of the whole line.
Provided that the algorithm for unfolding and normalization are clearly
and carefully specified and implemented, that could avoid the problems
Then if you want to verify whether a Processed
header is associated with a given Received line, do the same thing and
check the hash value.
Received: from 185129182.virtua.com.br (185129182.virtua.com.br
[188.8.131.52]) by smtpin-3211.bay.webtv.net (WebTV_Postfix+sws)
with SMTP id 7280B11DCC; Fri, 5 Mar 2004 19:00:48 -0800 (PST)
This string could hash to the value B17C07174B5E4546A2B04EB096E83FD7081936B8
and then you would have
Processed: name="SpamAssassin"; location-ip="184.108.40.206";
There's a more fundamental issue with the whole scheme; you're
simply trying to authenticate some added content (and as noted
above, there are existing mechanisms that can do so without
running roughshod over the existing transport mechanism). Let's
assume for the moment that a 100% foolproof authentication
mechanism is in place and that the field above is guaranteed to
have been inserted by the party that claimed to do so. I submit
that it tells me nothing particularly useful; I have no guarantee
that the claimed processing was performed or that the claimed
results are accurate. Even if I have blind trust in the party that
claimed to have inserted that field, I have no way of knowing if
the claimed name "SpamAssassin" is what I think it is, and if it is,
I have no idea whether it was modified at that site or if it is
"vanilla" SpamAssassin. Even if I assume that it is unmodified,
I have no idea what configuration files were used with it or
what run-time options were specified. In short, purported results
of an indeterminate process with indeterminate configuration
and indeterminate run-time parameters allegedly run at a
remote site are of little value.