procmail
[Top] [All Lists]

Re: continue scoring on next recipe

2002-01-18 15:26:14
At 19:47 2002-01-18 +0000, Stig Brautaset wrote:

[accumulating a score across several recipes]

As long as you're pretty much adding and subtracting on the score, the following procedure should be very effective. If you want to evaluate the total score as you go (say, to bail out early just because a threshold is exceeded), it's effectiveness may not be as great - but it'll still be better than the constant filtering of the message to add headers.

I solved it by creating a short C program called `add' that just adds up
its commandline arguments (if they are numbers). If the program is
called with the `-f' option, it will return the sum of its arguments as
its exit code.

Tricky if the sum becomes substantial.

Why not use the arithmetic expressions afforded to you by a good bourne type shell?

        echo $(( 12 + 3 ))

(this works with Bourne-based shells like bash -- you should check that yours supports it before basing scripts on it)

so, in procmail, you could get the math result from:

somescore=`echo $(( 12 + 3 ))`

A few pointers to what can be a good idea to include checks for (and
their severity) is appreciated; e.g. do spammers usually skip or add any
headers?

I think perhaps you should read the volumes of material which have been posted through this list and examine the various spam filters which exist. There are a lot of criteria, some more likely than others to identify spam, but it differs from individual to individual, as does the weighting and what is considered an acceptable level of collateral damage.

# this checks the header only
[snip]

I'd probably just maintain an ongoing score value rather than stuffing it into the header at each individual step. There is much less overhead that way (at least as it pertains to rewriting the message - we're still spawning other processes to perform the tabulation). When you've passed all the rules which add to the scoring, you could add it to the header, but still probably do your comparison against the score variable you stored the results in.

#set initial value
myscore=0

# this checks the header only and is CASE SENSITIVE
:0 D
* 2^1.5 (!|?|\*|\$)
* 10^1  ^(SUBJECT|TO|FROM|DATE)
{
        # this results in a TEXT message like "0 + 20"
        myscore="$myscore + $="
}

# do some checks in the body as well
:0 B
* 20^1 (flame|sex)
{
        # this results in a TEXT message like "0 + 20"
        myscore="$myscore + $="
}

# file as spam if accumulated score is more than defined limit
:0
* ? test $(( $myscore )) -gt $LIMIT
{
        # note scoring in header if and only if this matches.  move this
        # ABOVE the encapsulating rule if you always want the header added.
        # The 1* bit seems to be required to force the evaluation under
        # certain situations.
        :0f
        | formail -I "X-MyScore: $(( 1 * $myscore ))"

        :0:
        spam
}

The net result here is that you end up invoking the shell ONCE to perform the test, and optionally once to insert the header if you actually want it in your messages. In your version, you're relying on a support app, and you're adding MULTIPLE headers to your message, with a shell invocation each time which is piping the message through, which is a big processing and memory waste if the message is huge.

Here is the C program source:

[snip]

Here is the shell source <g>:

        echo $(( math_expression ))


If you don't have a Bourne-like shell supporting arithmetic operators, there is always 'bc', which requires piping the echo to resolve. I use bash, so I don't bother with it in these cases (and the workaround you'd need to use - "echo $myscore | bc" won't easily work from bash due to the way subshells are invoked for pipelines).

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>