procmail
[Top] [All Lists]

Re: A few rule questions

2003-12-14 18:02:10
At 15:57 2003-12-14 -0500, JoeHill wrote:

Note that as I explain on my website, I use what's called "SPAMMISHNESS" - a score threshold I define that says a particular message is spam. Then, different rules which are intended to identify spammy recipes can add a number to the running total, allowing me to use less definate things as indicators - I wouldn't heave a piece of mail just because the date stamp shows it a day older than when I received it, but it's certainly good for adding a few points towards the message being identified as spam. Just as freemail messages which don't appear to actually come through the identified freemail service.

Through the use of spammishness, you can make use of criteria which isn't always an indicator of spam - perhaps you're mother is one of those idiots^H^H^H^H^Hpeople who send HTML-only email (instead of plain text with an alternate HTML copy attached, which is how a legit mailer would do it). You can still use that as a spammish indicator (even without having to whitelist your mother), so long as the score you assign to it doesn't exceed the threshold. I have a threshold of like 250, and quite a few recipes that add only 25 to 40 points to the spam score - these are there to push the bigger offenders over the limit on some other criteria, or if there's a LOT of such little things wrong with a message, there is another rule that'll add more points just because of the number of problems.

But I *could* make them into seperate recipes, one for To, one for Cc, though
this would not be as elegant as below, of course, no?

It wouldn't allow you to acurratley _count_ the matches, or allow for variable number of contributors between the two headers. Yes, if you wanted to check for three in either header INDIVIDUALLY, you could do that, but that's not the same as a _total_ of three or more.

That's a beaut, thanks. I *never* get legit mail which is addressed/cc'd to more than 2 people in the Sympatico domain. In fact, I can't remember the last time I got a legit mail which was addressed/cc'd to *only* 2 people in the Sympatico domain. 99% of my mail is from lists or people who run their own mailservers (ie. not newbs like me).

I have a recipe sequence that identifies the recpient name in an email and checks for duplication of that username in other recipents - some spammers send messages to a series of "joe(_at_)domain, joe(_at_)domain2, joe(_at_)domain3" addresses.

Bogus dates! Brilliant! If a piece of mail takes 3 days to get to me, it
probably ain't worth reading anyway, right? Love it.

Well, there's old or advance-dated mail, then there's mail where the Date: field can't be parsed by the unix 'date' program as valid. Both are useful indicators. I even score higher for WAY out of range dates, and have an allowance for list-delivered email to lag more. >18 hours is iffy, worth 100 points, and >72 is 100 more points. If more than 2 hours ADVANCED clock (and time zones are already factored), there's 100 points. Invalid date header is worth 175.

So, with the setting I'm using, a bogus date header all by itself isn't enough to trash a message as spam - but there's almost _ALWAYS_ a number of other spammish characteristics about a spam message.

...and of course as soon as I implement this rule, this particular piece of spam will die out...heh. So far I'm getting one a day though.

It'll probably resurface again though. I can't say I've ever received any which conformed to that.

Good point. That explains why even though I have some rules that check for
"viagra" in the body (a lot simpler, you would think), they still come through.

Checking the body is also costly, processor wise, as you're scanning it over and over again looking for each keyword.

You'll enjoy a perusal through the procmail list archives (which are searchable - see the link on the procmail homepage), where you'll find a great many antispam rules. Abundance of symbols or runs of whitespace in the subject; recipient username identified in the subject; apparent website in the subject; etc.

I think, based on your advice, I'll leave the body checks out :-)

Well, there are times they are useful, and times they aren't. When you get more familiar with procmail, you can do things like:

# only for messages less than 30K in size
:0
* < 30000
{
        CLEANBODY=|some_html_base64_scrubber

        :0:
        * CLEANBODY ?? (plain|keywords)
        scum.mbx
}

No, there isn't a ready-made html_base65_scrubber kicking around, though there are some adaptable programs, such as lynx and mimencode.

lifetime of learning, at least for me. Main point is, if I can just keep my use
of the delete key down to once or twice a day, I'll consider it a victory!

I receive on average from 600-700 email messages into my inbox each day (well, that many which actually reach my MUA). In there, I get 6-8 spams a month, and that number has been petering off. I use DNSBLs at the MTA level, and my own collection of procmail recipes.

I also don't receive viruses, because executable attachment types are shuttled off in a server-global procmailrc with an advisory notice forwarded to the indended recipient.

BTW, I had a good chuckle over the Red Hat comments in the disclaimer page,
though I'm hesitant to ask what you think of Mandrake...;-)

Mandrake's main liability is funding (go commercial or remain fully open source - the somewhere-in-between state isn't really good). I've used Slackware for most of my own boxes for a very long time (and I compile everything - I don't use packages or RPMs), while I administer a small fleet of FreeBSD systems and a Debian box or two.

My beefs with RedHat hinges around their desire to do everything _different_ than everybody else, which doesn't make for a portable of familiar setup, and their repeated demonstrations of an inability to produce a thoroughly tested distro (shipping a bogus version of a C compiler in a boxed version is inexcuseable in my book).

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>