IP number checking (was Re: Spam: Are You In Need Of A Lifestyle Change)

On Mon, 29 Sep 1997 09:23:36 +0300 (EET DST), I wrote:

Jeff Thieleke <thieleke(_at_)ix(_dot_)netcom(_dot_)com> wrote:

Received: From mailhost.UTP.net(alt1.utp..net(333.2.44.55)) by utp.net;

                                         ^^    ^^^        ^^
Oops! IP (IPv4) numbers are 8 bit value (0-255)...333 is no good.
There is a recipe for this type of fakery, but I don't have ready
access to it at the moment. Can someone repost it?

I only have badly working ones on file. The primary problem with these
is that there will be other numbers in those headers which look a lot
like IP numbers unless you preparse them a little bit (for instance,


Blah blah. Try this: 

  * ^Received: from [^[( ]+ ?[[(]?(([a-z][-a-z0-9._]*)* ?)? ?[[(]\
        ((0|1?[1-9][0-9]?|2[0-4][0-9]|25[0-5])\.)*\
        (25[6-9]|[3-9][0-9][0-9]|[1-9][0-9][0-9][0-9]|0[])])

Paraphrase: from hostA (hostB [((valid IP numbers)\.)*invalid], with
the final octet being 0 also counting as invalid. This does not look
at the number of octets but making it look for exactly four octets
should be fairly easy (ideally, it should +match+ on anything with
one, two, three, or more than four octets, but leave four valid octets
alone. For logging purposes, getting all four into $MATCH would be
nice:

  * ^\/Received: from [^[( ]+ ?[[(]?(([a-z][-a-z0-9._]*)* ?)? ?[[(]\
        ((0|1?[1-9][0-9]?|2[0-4][0-9]|25[0-5])\.)*\
        (25[6-9]|[3-9][0-9][0-9]|[1-9][0-9][0-9][0-9])\
        (\.(0|1?[1-9][0-9]?|2[0-4][0-9]|25[0-5]))*(\.0)?[])]
  {
    LOG="$MATCH
"
  }

This is what I actually tested with -- I hope I didn't break it
somewhere along the way. Still no check for valid number of octets.)

I checked this quickly against the last few days' worth of spam from
the spam-list and found a handful of invalids. These were spams I had
already filtered on other grounds. Of the (67) spams my filters have
missed in the last couple of weeks, none were caught by this recipe.

is something like 4.0.994.63) but you can get pretty far by looking
only at Received: lines which are more or less like what Sendmail
generates and see if there's a "reverse lookup" number which looks
faked. The general format of these is 
  Received: from hostA by hostB (hostC [IP number])


Correction: Received: from hostA (hostB [IP number]) by hostC

Like I said, the above recipe only looks at the IP numbers in
Received: lines in exactly this form, with the slight modification
that I allow either normal or square brackets in both places, the
hostB is optional (as it would be when the IP number does't resolve),
and the spaces before the brackets are optional (spam software seems
to leave them out a lot, probably because the people who programmed
them don't have any aesthetic sense :-)

Hope this helps,

/* era */

Here's the matches I found and my other reasons for rejecting them:

 $ cat ~/scratch/inbox/spam-filtered.* |

 formail -s procmail ~/scratch/testing/.rc

 Received: from clift.b89_crost.com (clift.b89_crost.com [199.3.12.256]
 X-Rejected: Spam score +1
 X-Rejected: Received: after From: [5]
 X-Rejected: Spam score 5
 X-Rejected: From killfiled domain @usa.net [5]
 X-Rejected: Over 6000 bytes [5]
 X-Rejected: body contains ugly words [5:18]
 X-Rejected: body contains too many URL:s [+13]
 Received: from in2.i_b_m.net (in2.i_b_m.net [165.87.194.259]
 X-Rejected: To: equals From: f5net(_at_)hotmail(_dot_)com
 X-Rejected: Suspect From: hotmail.com not in Received: lines
 X-Rejected: body contains ugly words [0:3]
 Received: From mailhost.alp.net(alt1engery.it.com(983.2.33.57)
 X-Rejected: No valid Message-Id
 X-Rejected: To: equals Reply-to: 973Jim(_at_)dsnnet(_dot_)it
 X-Rejected: Received contains earthlink
 Received: from clift.b89_crost.com (clift.b89_crost.com [199.3.12.256]
 X-Rejected: Spam score +1
 X-Rejected: Received: after From: [7]
 X-Rejected: Spam score 7
 X-Rejected: From killfiled domain @usa.net [7]
 Received: From mailhost.west.com(alt1west.com(333.2.44.55)
 X-Rejected: No valid Message-Id
 X-Rejected: Received: after From: [3]
 X-Rejected: Spam score 3
 X-Rejected: From .com [3]

Note that only four of these X-Rejected lines are not based on
somewhat fuzzy and/or risky heuristics (From: equals To:/Reply-to: and
No valid Message-Id:) and so adding the sure-fire IP number sanity
check would probably be a good idea.

Relying on the faked Received: lines actually containing well-formed
host names might not be too wise. I saw several Received: lines with a
host name with a shout mark in it in the testing material

-- 
 Paparazzi of the Net: No matter what you do to protect your privacy,
  they'll hunt you down and spam you. <http://www.iki.fi/~era/spam/>