procmail
[Top] [All Lists]

RE: A simple rule for IP1<>IP2

2002-09-08 11:01:34
Tomislav Crnicki wrote:


Looking at the SPAM messages that still pass my procmail file I often 
see they have such a "Received from" line in the header (for instance 
this one from a message that just came in):

Received: from 196.40.67.195 ([210.15.67.232])

or generic:

Received: from IP1 ([IP2])

where IP1<>IP2.

What would be the rule be to dump such messages to /dev/null?

I suppose this probably is more-less basic but I didn't find such a 
rule or anything similar searching arround.

Thanks in advance,

Tomi Crnicki - Abacus, Croatia

Hi, Tomi,

I left in your signature, because I wanted to remember to ask you
about the name of your town: does it have something historical to
do with the ancient counting instrument?  Anyway, interesting.

I don't find your question at all simplistic -- or basic.
However, I liked the idea quite a bit, with some revision, so
I went ahead and did it to add to my own spamsnag collection.

One must be careful with this approach: often, legitimate mail
is handed off from a client machine that uses different IP addresses
for different purposes.  For example, it could be wholly legit for
a server to broadcast one IP address for incoming SMTP and another
for outgoing.  Or maybe the server doubles as a web or ftp server,
which could also broadcast a different number.  I would be reluctant
to trash numbers mismatched in only the last part, or even in
the last half of a dotted-quad address.  So I decided to
compare the first half of the double-dotted-quads only.

I also chose to ignore the other thing you're asking about, where
the two names are not broadcast IP numbers but names, because
those are mismatched a very high amount of the time by
legitimate servers.  One name could be an mx-record alias for
the other, for instance.

Your idea dovetails nicely with some other Received: header stuff
I have worked on very recently and about whose results I am pleased
with.  So let's build a solution here.

First, you have to decide which sets of Received: headers you're
going to bother looking at.  I wouldn't find it all that useful
to check every Received: header for this.  The bottom one, though,
is often forged in spam, and makes a good target for the test.  I
am finding that my other tests directed at the bottom Received:
header ID about 20% of my spam that way alone.  The test I
designed based on your idea identified four of my most recent
100 spams (I save the most recent 100 in a running cache).

So, how do we find the bottom Received?  This link points to
an article posted to this list of a couple of weeks ago that
shows how:

http://www.rosat.mpe-garching.mpg.de/mailing-lists/procmail/2002-08/msg0
0507.html

Call your variable you save the line to whatever you want.  I'll call it
"BOTTOM" here.

[Okay, at this point in composing my answer, Bart Schaefer's response
to the list has come in.  It looks very good (as usual -- thanks, Bart);
but I had some problems with the quoting when I experimented an hour
ago using similar methods.  Moreover, I stick by what I said about
testing
only the first half of the dotted quads.  Moreover, what happens when
the client in the Received: header hands off to the server, and both
use dotted-quad addressing?  If we just look for two (or more) sets of
dotted quads in the header, we're likely to get fooled.  (Yes, I noticed
that Bart looked for parens around the second pair.  That's good for
canonical headers, but bad for finding spam, where the forged headers
often depart from the expected standards.)  So I will go ahead and
continue 
with this much more longwinded version of an answer.]

First of all, somewhere up above I set this:

  DOTQUAD = [0-9]+\\.[0-9]+\\.[0-9]+\\.[0-9]+

I double-quoted the dots because, as I said moment ago, I couldn't
get the "$\VAR" syntax with procmail's internal quoting to work
with this.

I also have defined up above in my rc "$WS", which is set to
a space and a tab.

First, I find just the putative client machine from the bottom
Received: header, so I won't match the wrong dotted quads:

 :0  # look for sender's asserted machine
  * $  BOTTOM ?? ^^from +\/[^$WS].+[$WS]by[$WS]
  * $  MATCH       ?? ^^\/.+[^$WS]
  * $  MATCH       ?? ^^\/.+[^y]
  * $  MATCH       ?? ^^\/.+[^b]
  * $  MATCH       ?? ^^\/.+[^$WS]
  { CLIENT = $MATCH }

Then I do this:

 :0  # scam-tipped dotted-quad mismatch
  * $ CLIENT ?? ()$DOTQUAD\>(.*\<)?\/$DOTQUAD\>
  *   MATCH ?? [^0-9]*\/[0-9]+\.[0-9]+\.
  { SECOND_PAIREDQUAD = $MATCH }

 :0 A
  * $ CLIENT ?? ()\/$DOTQUAD\>
  *   MATCH ?? [^0-9]*\/[0-9]+\.[0-9]+\.
  { FIRST_PAIREDQUAD = $MATCH }

 :0 A:
  * $! FIRST_PAIREDQUAD ?? $SECOND_PAIREDQUAD
  suspectmail


Enjoy.

-- 
Dallman Ross

"If you find a path with no obstacles, it probably does not lead to
anywhere."
        Thoughts of Rev. Sunnan Kubose, from _Zen in the Markets_ 

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail