Re: continue scoring on next recipe

At 19:47 2002-01-18 +0000, Stig Brautaset wrote:

[accumulating a score across several recipes]

As long as you're pretty much adding and subtracting on the score, thefollowing procedure should be very effective. If you want to evaluate thetotal score as you go (say, to bail out early just because a threshold isexceeded), it's effectiveness may not be as great - but it'll still bebetter than the constant filtering of the message to add headers.

I solved it by creating a short C program called `add' that just adds up
its commandline arguments (if they are numbers). If the program is
called with the `-f' option, it will return the sum of its arguments as
its exit code.


Tricky if the sum becomes substantial.

Why not use the arithmetic expressions afforded to you by a good bournetype shell?


        echo $(( 12 + 3 ))

(this works with Bourne-based shells like bash -- you should check thatyours supports it before basing scripts on it)


so, in procmail, you could get the math result from:

somescore=`echo $(( 12 + 3 ))`

A few pointers to what can be a good idea to include checks for (and
their severity) is appreciated; e.g. do spammers usually skip or add any
headers?

I think perhaps you should read the volumes of material which have beenposted through this list and examine the various spam filters whichexist. There are a lot of criteria, some more likely than others toidentify spam, but it differs from individual to individual, as does theweighting and what is considered an acceptable level of collateral damage.

# this checks the header only

[snip]

I'd probably just maintain an ongoing score value rather than stuffing itinto the header at each individual step. There is much less overhead thatway (at least as it pertains to rewriting the message - we're stillspawning other processes to perform the tabulation). When you've passedall the rules which add to the scoring, you could add it to the header, butstill probably do your comparison against the score variable you stored theresults in.


#set initial value
myscore=0

# this checks the header only and is CASE SENSITIVE
:0 D
* 2^1.5 (!|?|\*|\$)
* 10^1  ^(SUBJECT|TO|FROM|DATE)
{
        # this results in a TEXT message like "0 + 20"
        myscore="$myscore + $="
}

# do some checks in the body as well
:0 B
* 20^1 (flame|sex)
{
        # this results in a TEXT message like "0 + 20"
        myscore="$myscore + $="
}

# file as spam if accumulated score is more than defined limit
:0
* ? test $(( $myscore )) -gt $LIMIT
{
        # note scoring in header if and only if this matches.  move this
        # ABOVE the encapsulating rule if you always want the header added.
        # The 1* bit seems to be required to force the evaluation under
        # certain situations.
        :0f
        | formail -I "X-MyScore: $(( 1 * $myscore ))"

        :0:
        spam
}

The net result here is that you end up invoking the shell ONCE to performthe test, and optionally once to insert the header if you actually want itin your messages. In your version, you're relying on a support app, andyou're adding MULTIPLE headers to your message, with a shell invocationeach time which is piping the message through, which is a big processingand memory waste if the message is huge.

Here is the C program source:


[snip]

Here is the shell source <g>:

        echo $(( math_expression ))

If you don't have a Bourne-like shell supporting arithmetic operators,there is always 'bc', which requires piping the echo to resolve. I usebash, so I don't bother with it in these cases (and the workaround you'dneed to use - "echo $myscore | bc" won't easily work from bash due to theway subshells are invoked for pipelines).


---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail