procmail
[Top] [All Lists]

Trapping non-standard URL's in spam

1999-03-21 03:30:21
  Spammers have started monkeying with URL's in spam
messages in order to make it harder to find them, while
at still providing working links to their webpages.
They've resorted to encoding IP's in one 32-bit number,
and stunts like octal or hex coding for the dotted quads.
And for people who can't make up their minds, you can
mix-n-match (at least on Win98)...

=============================================
D:\>PING 209.0314.0xe1.30

Pinging 209.204.225.30 with 32 bytes of data:
=============================================

  What's *REALLY* insidious is that spammers might try
slipping a fast one past us by mixing decimal and octal.
This could lead to mis-directed complaints, e.g...

=============================================
D:\>ping 209.204.225.077

Pinging 209.204.225.63 with 32 bytes of data:
=============================================

  It's getting to be a reliable spam-sign when non-standard
URL's show up in the body of a message.  Here's a first try
at a procmail recipe to catch them.  And while we're at it,
let's not forget spammer-fudging with @'s.

 NONSTANDARD="(0x[0-9a-f]+|0[0-7]+)"
 :0fb
 *  1^0 http:(//|//.*@)[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]
 *  1^0 
http:(//|//.*@)0x[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]
 *$ 1^0 http:(//|//.*@)${NONSTANDARD}\..*\..*\..*
 *$ 1^0 http:(//|//.*@).*\.${NONSTANDARD}\..*\..*
 *$ 1^0 http:(//|//.*@).*\..*\.${NONSTANDARD}\..*
 *$ 1^0 http:(//|//.*@).*\..*\..*\.${NONSTANDARD}
 | formail -A "X-Reject: Non-standard URL format; often used by spammers"

  The var NONSTANDARD matches one byte of either a hex or
octal quad.  The individual regex lines work as follows...

1) Check for an integer IP (either decimal or octal).  In
   base 10, 1.0.0.0 translates to 16777216, so a working
   IP address needs at least 8 digits, more in base 8.

2) Check for an integer IP in base 16.  1.0.0.0 = 0x1000000
   so at least seven digits are needed for an IP address.

3) through 6) Check for non-standard (non-base-10) notation
   in each of the quads separately, just in case spammers
   try to mix-n-match.

   If any of the above lines matches, the email gets flagged.
Any comments, suggestions, etc?

-- 
Walter Dnes <waltdnes(_at_)interlog(_dot_)com> procmail spamfilter
http://www.interlog.com/~waltdnes/spamdunk/spamdunk.htm
Why a fiscal conservative opposes Toronto 2008 OWE-lympics
http://www.interlog.com/~waltdnes/owe-lympics/owe-lympics.htm

<Prev in Thread] Current Thread [Next in Thread>