a header authentication scheme


I'd like to solicit some comments or opinions on part of an idea
which is being developed on a different mailing list, namely
the ASRG spam filtering list

http://asrg.sp.am/subgroups/filtering.shtml

I'm CC'ing the filtering list, but you may need to be subscribed to
post there, so you might elect only to reply here.

I'll give a few paragraphs of introduction, and a synopsis of the 
idea. Your input will hopefully be helpful to refine and improve the
work there. 

Introduction:

The ASRG spam filtering subgroup is trying to come up with ideas for
eventual standards related to the filtering performed by spam filter
software on RFC 2822 messages. With the plethora of spam filters and
virus filters and so on, there's an explosion in the number of
proprietary headers being added to a message during transport to the
MUA (ie not necessarily just SMTP transport, but for example during
POP3 download or procmail processing).

It seems like a good idea to look for ways to standardize such
headers, so that other software and humans can inspect the headers
added by filtering software, not just the originating software.

There's currently a filtering draft being edited (it's not an official
IETF draft, more a working document), in which the proposal
is to define an extensible header which in one particular instance
might look like this:

   Processed: name="SpamAssassin"; version="2.63"; function="spamcheck";
       auth-received="7280B11DCC"; result-tag="spam";

The idea is to define key value pairs, some of which would have
universal meanings while others would be added as required by the
software writing the header. The reason for choosing those MIME-style
key value pairs is that it should make it a fairly uniform task for
software to parse the header, regardless of the software writing
it. Unrecognized keys would be ignored, while recognised keys may or
may not be acted upon by downstream software which would receive or
pass along the message.

The problem:

Now that you have a little background, I want to describe the idea I'm
particularly interested in. When you see a header such as the one I've
sketched above in a message, it's not clear where in the message
transport this header was added. Suppose A sends a message to B, and
the message passes through several machines M1, M2, M3 say:

A -> M1 -> M2 -> M3 -> B

For argument's sake, we'll assume A is the sender's MUA, B is the
receiver's MUA, and little is known about the other machines. The
software which adds the Processed: header shown above may reside in A,
M1, M2 or M3.  The task is to facilitate B's discovery of where the
header was added, through a standard.

For example, if A is a spammer, then he might add the Processed: header
at A, expecting to fool B into accepting the verdict at face value.
That would be a forgery which B wants to recognize.

This is a fairly realistic situation, since spammers do forge headers 
routinely, and if the message's path before arrival at B is sufficiently
complex, then B may not have a full picture of which filters are
installed at which locations, and how each such filter behaves. So the
location hint must be self-explanatory.

Of course, simply inserting the filter's location (e.g. IP address) 
among the key-value pairs in the Processed: header is not reliable, e.g.

   Processed: name="SpamAssassin"; location-ip="1.2.3.4"; 
       version="2.63"; function="spamcheck";
       auth-received="7280B11DCC"; result-tag="spam";

does not help solve the problem at all, since a forgery originating
at A could simply forge the correct IP address. 

A particularly important subproblem is the following: how can B recognize
if the Processed: header was added at M3 versus added earlier?

The (partial) solution:

I shall now propose a partial solution for discussion, which was
worked out on the ASRG filtering list: recall that a fundamental fact
of SMTP transport is that Received: headers are added at the top and
must not be rearranged.

A consequence of this is that, once the message is at B, the top most
Received: line is guaranteed to have been added by M3 (assuming M3 is
an SMTP server). This means that a subsequent filter (at or after M3)
will be handed a message containing the topmost (for B) Received:
line, and such a filter can prove its location beyond M3, by quoting a
substring from that topmost Received: line.

The question is which piece of the Received: line should be quoted?
It would have to be a piece which cannot be easily guessed by forgers
located before M3 (ie at A, M1 or M2). For example, the server M3's 
domain name would be  apoor choice, since that name could be discovered
through various means by the spammer, and once known, quoted many times.

There are two pieces of information which change constantly in a 
Received: line, these are the ID (which is optional) and the date-time stamp.
So one of these two must be quoted in each Processed: header line.

For example, if the topmost Received: line (at M3) is

   Received: from 185129182.virtua.com.br (185129182.virtua.com.br
       [200.185.129.182]) by smtpin-3211.bay.webtv.net (WebTV_Postfix+sws)
       with SMTP id 7280B11DCC; Fri,  5 Mar 2004 19:00:48 -0800 (PST)
 
then a Processed: header of the form
 
   Processed: name="SpamAssassin"; location-ip="1.2.3.4"; 
       version="2.63"; function="spamcheck";
       auth-received="7280B11DCC"; result-tag="spam";

is guaranteed to have been added *after* the Received: line was inserted
in the message, ie at or after M3. Such a Processed: line is unforgeable,
unless the auth-received value can somehow be predicted with high probability.

Similarly, a Processed: header of the form

   Processed: name="SpamAssassin"; location-ip="1.2.3.4"; 
       version="2.63"; function="spamcheck";
       auth-received="Fri,  5 Mar 2004 19:00:48 -0800 (PST)"; 
       result-tag="spam";

is unforgeable unless the timestamp inserted at M3 can be predicted
at M2 or earlier. 

I am a little worried that the time stamp is a little easier to
predict from M2, as it only lists seconds, and a second can be a long
time in an SMTP transaction. If the Received: timestamp included also
microseconds (which would require an extension of RFC 2821), this would be
less risky. I also don't know how easy the optional ID would be to predict
in practice, but I expect quite hard.

I'd like comments and opinions, particularly (but not exclusively) from
people who have implemented SMTP servers, to see whether this idea
is robust, or if not whether it is salvageable. I can't see much that's
wrong with it.
 
-- 
Laird Breyer.