At 19:47 2002-01-18 +0000, Stig Brautaset wrote:
[accumulating a score across several recipes]
As long as you're pretty much adding and subtracting on the score, the
following procedure should be very effective. If you want to evaluate the
total score as you go (say, to bail out early just because a threshold is
exceeded), it's effectiveness may not be as great - but it'll still be
better than the constant filtering of the message to add headers.
I solved it by creating a short C program called `add' that just adds up
its commandline arguments (if they are numbers). If the program is
called with the `-f' option, it will return the sum of its arguments as
its exit code.
Tricky if the sum becomes substantial.
Why not use the arithmetic expressions afforded to you by a good bourne
type shell?
echo $(( 12 + 3 ))
(this works with Bourne-based shells like bash -- you should check that
yours supports it before basing scripts on it)
so, in procmail, you could get the math result from:
somescore=`echo $(( 12 + 3 ))`
A few pointers to what can be a good idea to include checks for (and
their severity) is appreciated; e.g. do spammers usually skip or add any
headers?
I think perhaps you should read the volumes of material which have been
posted through this list and examine the various spam filters which
exist. There are a lot of criteria, some more likely than others to
identify spam, but it differs from individual to individual, as does the
weighting and what is considered an acceptable level of collateral damage.
# this checks the header only
[snip]
I'd probably just maintain an ongoing score value rather than stuffing it
into the header at each individual step. There is much less overhead that
way (at least as it pertains to rewriting the message - we're still
spawning other processes to perform the tabulation). When you've passed
all the rules which add to the scoring, you could add it to the header, but
still probably do your comparison against the score variable you stored the
results in.
#set initial value
myscore=0
# this checks the header only and is CASE SENSITIVE
:0 D
* 2^1.5 (!|?|\*|\$)
* 10^1 ^(SUBJECT|TO|FROM|DATE)
{
# this results in a TEXT message like "0 + 20"
myscore="$myscore + $="
}
# do some checks in the body as well
:0 B
* 20^1 (flame|sex)
{
# this results in a TEXT message like "0 + 20"
myscore="$myscore + $="
}
# file as spam if accumulated score is more than defined limit
:0
* ? test $(( $myscore )) -gt $LIMIT
{
# note scoring in header if and only if this matches. move this
# ABOVE the encapsulating rule if you always want the header added.
# The 1* bit seems to be required to force the evaluation under
# certain situations.
:0f
| formail -I "X-MyScore: $(( 1 * $myscore ))"
:0:
spam
}
The net result here is that you end up invoking the shell ONCE to perform
the test, and optionally once to insert the header if you actually want it
in your messages. In your version, you're relying on a support app, and
you're adding MULTIPLE headers to your message, with a shell invocation
each time which is piping the message through, which is a big processing
and memory waste if the message is huge.
Here is the C program source:
[snip]
Here is the shell source <g>:
echo $(( math_expression ))
If you don't have a Bourne-like shell supporting arithmetic operators,
there is always 'bc', which requires piping the echo to resolve. I use
bash, so I don't bother with it in these cases (and the workaround you'd
need to use - "echo $myscore | bc" won't easily work from bash due to the
way subshells are invoked for pipelines).
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail