Re: Spammish?

On 16 Feb, fleet(_at_)teachout(_dot_)org wrote:
| On Sun, 16 Feb 2003, Don Hammond wrote:
| 
| > This is a little long, but maybe another example will help.
| 
| Thanks very much for the examples!

You're welcome.

| > My spamchkrc currently has 40 spam tests.
| 
| I have 29 checks on the Message-ID alone! :)

I used to spend way too much time, and get way too aggravated, chasing
my tail creating and updating filters. I got sick of it and decided to
get more aggressive at the MTA level. Consequently I've fallen years
behind on my procmail spam recipes, but happily see at most a couple
spam messages a week. Sendmail is turning away between 15 and 20 a day.
Procmail for me is much more about reliably identifying good mail than
bad.  I half-heartedly maintain the spam filters more as an exercise
than a necessity.  The real focus is on the process, more than the
actual filters. The day will come when I'll no longer be able to
indiscriminately block entire netblocks like I do now, and when it does
I want to be confident that the framework is solid.  Then I'll get up
to speed on current techniques. So, right now, I don't have 29 checks
for Message-Id. ;-)

| [...]
| Very interesting.  I'm particularily intriqued by the counting recipes.
| May I contact you later for more details?  (I'm not very well versed, yet,
| in the scoring method.)

No problem.  As I mentioned in the original post, my mail servers are
my own and I don't worry much about processing efficiency.  I'm not
reckless about it, but the way I do some things might cause others to
turn up their noses.  Counting recipients is one of those.

Over time, I found myself reusing things in many different rcfiles, so
there are now 2 rcfiles that do little more than set variables for all
the rest that follow.  That means some variables get set and never used
for some messages, but over all it's much easier to maintain than having
them peppered throughout all the rcfiles.  It also means I don't have to
rescan common headers over and over. I scan them once then use the
variables. I combine some things in those 2 files for simplicity, but
sometimes at a cost. I use a perl one-liner to populate variables with
the To: and Cc: headers stripped down to just the addresses, and at the
same time return the count.  There are much better ways to get the
count.

I know this has been discussed, but I'm too tired to dig it out.
Probably something like this is reasonable (though untested):

:0
* ^To:[  ]*\/[^  ]+
* 1^1 MATCH ?? ,
* 1^0
{ TOCOUNT = $= }

TOCOUNT = ${TOCOUNT:-0}

Unless my brain is totally on hiatus, that should count commas and add
one. If there's no To: header, the scoring never gets done so TOCOUNT
defaults to 0 (zero) and you know it always has a numeric value for use
in subsequent recipes.  If you find yourself scanning the To: header
repeatedly, then it might be worth saving it to a variable as part of
this step. Then later recipes don't have to scan the headers again, but
can simply test your variable via the VAR ?? syntax. Something like:

:0
* ^To:[  ]*\/[^  ]+
{ TO = "$MATCH" }

:0
* 1^0 TO ?? .
* 1^1 TO ?? ,
{ TOCOUNT = $= }
:0E
{ TOCOUNT = 0 }

Ditto for Cc:

Then instead of doing:

* ^To: someone(_at_)somewhere

you do:

* TO ?? someone(_at_)somewhere

| > Having a separate rcfile to cumulate variables and scores means I don't
| > have to duplicate the same code over and over (except for the common
| > variable assignments and INCLUDERC).  It's much easier for maintenance.
| 
| I was wondering about maintenance.

I break things into small pieces.  For example, rc.spamtype I mentioned
in the previous message is called for every spam recipe that matches.
If I want to change logging, or header munging, etc., I only have to
change it one place instead of changing each recipe individually. It
also keeps the diagnostics uniform across all recipes. That, and things
like .00varsrc and .01varsrc mentioned above are things I like to do to
ease maintenance.

| [...]
| How does one set up the various scores for a recipe?  I mean, what
| determines if a hit amounts to 2 or 1.38 or whatever?

If you're asking about my examples specifically, the value I use for
each recipe is assigned to ADDSPAMSCORE. The cumulative total is kept in
SPAMSCORE, something like (for each matching recipe):

:0
* $    $SPAMSCORE^0
* $ $ADDSPAMSCORE^0
{ SPAMSCORE = $= }

All that's doing is incrementing SPAMSCORE by the current value of
ADDSPAMSCORE (and retaining the new total in SPAMSCORE).

-- 
Email address in From: header is valid  * but only for a couple of days *
This is my reluctant response to spammers' unrelenting address harvesting



_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail