procmail
[Top] [All Lists]

Re: Spam: Are You In Need Of A Lifestyle Change

1997-09-28 23:29:09
On Sun, 28 Sep 1997 22:44:05 -0500 (CDT),
Jeff Thieleke <thieleke(_at_)ix(_dot_)netcom(_dot_)com> wrote:
:0
* ^Subject:.*are\ you\ in\ need\ of\ a\ life
/dev/null

It bears pointing out that you don't need to backslash-escape the
spaces. Might be a good idea, though, to try this (due to Rik Kabel): 

  * ^Subject:.*are\<*you\<*in\<*need\<*of\<*a\<*life

(BTW, wouldn't \<+ work better?)
  In my own recipes, I've used "life style" alone as a good sign that
a message is spam. (Other good ones include debt, phone card, long
distance, xxx, adult, etc, as well as the obvious MLM and FREE. A
friend of mine made the observation that a lot of spam and very little
legit mail includes the word "you" in the subject but I'm too chicken
to try that :-)

From: N8dx1k7gM(_at_)unlimited(_dot_)net

I've been thinking about ways to catch these. They're fairly obvious
to the human eye but hard to pin down in any meaningful way. Ideas,
anyone? Also note that there is no "unlimited.net" anywhere in the
Received: lines. (Shouldn't a strictly conformant message have a
Sender: with the real sender ID if you're overriding From: and if so,
how many are doing this in practice?)

Received: from ctcpXzPDJ  (dd30-242.dub.compuserve.com [199.174.147.242])
                 ^^^^^^^^???
by mustang.via.net (8.6.9/8.6.9) with SMTP id LAA28431; Sat, 27 Sep 1997

The simple fact that the stuff in the parens don't match what the
sender said are already a good clue. It happens a lot on legitimate
mail but it's a good thing to include in a scoring recipe. 

Message-ID: <BrS5>
Does anyone have a good Message-Id: recipe?  I came up with one that
validated Sendmail Message-Id's, but programs like Pine and qmail have
their own variations that break this.
* ^Message-Id: (<>|<none>|0000000000.\AAA000)
catches the obvious fakes, but not ids such as "BrS5"

Here's what I've been using. There is software out there that breaks
RFC822 in that they don't include an "@" in the Message-Id. I don't
care too much since I see them in my spam tank but if you send stuff
to /dev/null, you'll probably want to take out the @ part. 

:0
* ! ^Message-Id:[       ]*<[^   <>@]+(_at_)[^   <>@]+>[         ]*$
{ REJECT="$REJECT${REJECT:+$NL}${REJ}No valid Message-Id" }


Received: From mailhost.UTP.net(alt1.utp..net(333.2.44.55)) by utp.net;Sat,
                                          ^^    ^^^        ^^
Oops!  IP (IPv4) numbers are 8 bit value (0-255)...333 is no good.  There is 
a
recipe for this type of fakery, but I don't have ready access to it
at the moment.   Can someone repost it?

I only have badly working ones on file. The primary problem with these
is that there will be other numbers in those headers which look a lot
like IP numbers unless you preparse them a little bit (for instance,
Microsoft Mail Server Received: lines contain a version number which
is something like 4.0.994.63) but you can get pretty far by looking
only at Received: lines which are more or less like what Sendmail
generates and see if there's a "reverse lookup" number which looks
faked. The general format of these is 

  Received: from hostA by hostB (hostC [IP number])

but you'd have to find an efficient way to fish out the IP number from
all of them and look at each. (The semi-obvious Procmail-only solution
ends up looking only at the first one. You could make it look at only
the last one instead and be fairly safe that this is almost always
right, but it all smacks of kludgery in the end. Anybody have an
elegant solution?)
  Note that you'd generally want hostB and hostC to be more or less
the same, but you can't dump a message merely because they don't
match. For one thing, you often see host names with aliases (i.e.
moo.net (mail.moo.net [123.45.67.89]) and even a.net (b.com))
  Another thing I've been trying somewhat unsuccessfully to match is
the fairly common spammer trick to say HELO receivinghost resulting in
Received: from receivinghost (otherhost [blah blah]) by receivinghost 
but that's not a sure sign it's spam, either.

Where "MyEmailAddress" is replaced by your email address(es).  By dumping 
everything that is not specifically addressed to you to a non-default
folder, you virtually eliminate all spam that escapes your other filters.
This is after you filter out mailing lists and such, of course.

This is dubious advice, but you probably know that already. Some
people receive legitimate BCC:s, others don't. 

address, this spam actually has fairly clean headers.  It should have still

Huh? It's +terribly+ forged. Most of the Received: headers will always
look more or less legitimate because they're added by legitimate
software. One faked Received: line and you're dead in my book, though.
Also note that the domain on the To: address was added at
mustang.via.net. Could have been forwarded to you from there, but
that's another thing to hang on to. Finally, I've been thinking about
a recipe to catch the situation where there's a From: and a Reply-To:
but neither appears in any legit Received: lines. (Okay, if you had
missed the fact that the final Received: is fake, this one would have
slipped through that crack, but not a Message-Id sanity check.) 
  Also, Felix, did your local software add the X-Uidl header or was it
in the spam itself?

/* era */

-- 
 Paparazzi of the Net: No matter what you do to protect your privacy,
  they'll hunt you down and spam you. <http://www.iki.fi/~era/spam/>