Re: Filtering bogus Received lines

Excerpts from mail: (15-Sep-97) Filtering bogus Received lines by Brian Buchanan

Does anyone have a procmail recipie for catching spams by checking for
lines like:

Received: from spoofed.site (real.site.here [real.ip]) by mail.host.com

I'd like to be able to check if the 2nd level domain of what is claimed in
the HELO and that of what the relay actually reports as the hostname for
the sending host differ.  99% of spams I see have this characteristic.


Unfortunately, a fair amount of legitimate e-mail from dial-up PPP accounts
and other sources can also have this characteristic, so this is far from a
foolproof method of determining which e-mails are spam.. Still, it might be
useful for fighting spam if used as just one of multiple criteria in
determining whether an e-mail is spam or not.

What you are asking is complicated by the fact that most e-mails have
multiple Received: headers. The only way of handling something like that is
to use recursive INCLUDERCs. Well, you could use a Perl script to do it. If
you have no qualms about running Perl on each of your incoming e-mails, then
that would be a much cleaner solution. However, like I said, it is possible
to do this in procmail only and, since I've never done a recursive INCLUDERC
before, I thought I'd give it a shot.

Also, in addition to

Received: from spoofed.site (real.site.here [real.ip]) by mail.host.com

this recipe also checks

Received: from [spoofed.ip] (real.site.here [real.ip]) by mail.host.com

I found a fair number of e-mails that corresponded to that type of Received:
header in my mail archives, so I thought I'd add that check as well since it
was so similar and straightforward to implement in the context of
implementing the first check.

OK, in your .procmailrc, put the following:
#--------------------------------------------------------------------------
RECEIVEDHDR = "(Received:[      ]*from [^ ]+ \([^ ]+ \[[0-9.]+\]\) by )"

:0
* $ ^\/$RECEIVEDHDR(.*$)+
{
     HEADERLINES = $MATCH

     BAD_RECEIVED_FLAG = 0

     INCLUDERC = check_received.rc

     :0:
     * BAD_RECEIVED_FLAG ?? ! ^^0^^
     mbox.possible-spam
     # or whatever you want to do with it...

}
#--------------------------------------------------------------------------

Then, make a file named check_received.rc with the following recipe in it:

#--------------------------------------------------------------------------
:0
* HEADERLINES ?? $ ^^(.*$)+\/$RECEIVEDHDR(.*$)+
{ REMAININGLINES = $MATCH }

:0E
{ REMAININGLINES }

:0
* HEADERLINES ?? $ ^^Received:[         ]*from \[[0-9.]+\] \([^ ]+ \[\/[0-9.]+
{
     REVERSE_IP = $MATCH

     :0
     * HEADERLINES ?? $ ^^Received:[    ]*from \[\/[0-9.]+
     {
          HELO_IP = $MATCH

          :0
          * HELO_IP ?? .
          * REVERSE_IP ?? .
          * HELO_IP ?? ! $ ^^$\REVERSE_IP^^
          { BAD_RECEIVED_FLAG = 1 }
     }
}

:0E
* HEADERLINES ?? $ ^^Received:[         ]*from [^ ]+ \(\/[^ ]+
{
     REVERSE_HOST = $MATCH

     :0
     * HEADERLINES ?? $ ^^Received:[    ]*from \/[^ ]+
     {
          HELO_HOST = $MATCH

          :0
          * HELO_HOST ?? .
          * REVERSE_HOST ?? .
          * HELO_HOST ?? ! $ ^^$\REVERSE_HOST^^
          { BAD_RECEIVED_FLAG = 1 }
     }
}

# Now, do recursion, but only if BAD_RECEIVED_FLAG is non-zero
# and REMAININGLINES is not an empty string.
:0
* BAD_RECEIVED_FLAG ?? ^^0^^
* REMAININGLINES ?? .
{
     HEADERLINES = $REMAININGLINES

     INCLUDERC = $_

}
#--------------------------------------------------------------------------

Now, if only all e-mail servers would upgrade to the latest and greatest
version of sendmail, then all Received: headers would be in one of these two
formats! As it is, I'd estimate that less than a quarter of all Received:
headers match these two formats. Too bad.

One possible additional feature I was considering was to count the number of
"bad" Received: headers. So instead of just assigning BAD_RECEIVED_FLAG the
value of one, it would increment it for each bad Received: header it found.
Then, in your spam processing, you could treat it differently depending on
how many bad Received: headers an e-mail has. To implement that, you'd need
to use scoring to increment the variable. I'm not particularly fluent in
scoring, but I think I've figured out how to do that: Just replace
"{ BAD_RECEIVED_FLAG = 1 }" in both places in check_received.rc with the
following:

          {    # Increment BAD_RECEIVED_FLAG by 1.
               :0
               * $ $BAD_RECEIVED_FLAG^0
               * $ 1^0
               { BAD_RECEIVED_FLAG = $= }
          }

and remove "* BAD_RECEIVED_FLAG ?? ^^0^^" from the last recipe in
check_received.rc. Anybody want to check me on that?

Also, does anyone know if it's possible to have procmail check to see if
any Received: lines appear after headers such as From, Message-Id, To,
etc.? Many spammers also make this mistake.


Again, some legitimate e-mail can have Received: headers that come after
From: headers, so I would only recommend using this criterion in conjunction
with other criteria in attempting to determine whether an e-mail is spam or
not. I use a variation on the following recipe:

:0:
* ^From:.*$Received:
mbox.possible-spam

Later,
Ed