At 14:25 2002-01-30 -0800, Harry Putnam wrote:
I've been refining a spam catching string of recipes for a while now,
probably not the smartest ones out there and certainly could have more
sophistication in the regex department. The way I go at is to have a
series of recipies with different screening regex, and NOT (!)
operators, that write to groups like spam_suspect[1-8].
You could set variables based on each set of tests which a message tripped
on, then
Harken back to the discussion just this past week about continuing
scoring. Similar principles would apply. Over a period, you might
determine which rules are more effective at positively identifying spam
more often, and then move them to the head of your rules and return to
filing them away as suspect rather than continuing to test others. For all
I know, you applied this technique to arrive at where you are now...
a similar process with false hits. Find the offending filter regex
and fix it.
Some false hits are most easily avoided by greenlisting some lists or
posters. Depends on the nature of your email. For instance, I use scoring
to accumulate a weighted count of things such as exclamation points (which
appear frequently in spam, because spammers are just so
enthusiastic). Programming lists (including this one), as well as digests,
shouldn't be subjected to such logic because they either make extensive use
of bangs, or (in the case of digests) represent a large number of
messages. I've thought about allowing X bangs per KB of message or
something, but haven't sat down to figure out how to apply such logic.
The rub is that my .log file doesn't show me a very detailed report of
what was hit. I have full logging on
Ouch. If you define your spam rules within a separate file, which you
includerc into your main procmail.rc, then you can also include them from a
standalone testing .rc file, which defines a different logfile and working
directory, delivers default to /dev/null, and which enables verbose. Then,
if you have a message (or a whole mailbox) you want to see more information
about, you can pump it into the test script, which will report with
verbosity (to the separate, testing logfile). Meanwhile each message
coming into your mailbox isn't generating hundreds of lines of logfile
content unnecessarily.
32133 procmail: Match on
"^To:(_dot_)*(_at_)pop(_dot_)newsguy(_dot_)com|^Received:.*\
(\.tw |\.kr|[^0-9.]202\.|[^0-9]211\.|\
[^0-9.]6[1-6]\.|bogota\.supernet\.com\.co)"
Yea, not very useful.
:0 D
* ^To:(_dot_)*(_at_)pop(_dot_)newsguy(_dot_)com|^Received:.*\
(\.tw |\.kr|[^0-9.]202\.|[^0-9.]211\.|\
[^0-9.]6[1-5]\.|bogota\.supernet\.com\.co)
:0D
* Other_conditions_just_like_you_have_them_but_for_efficiency_do_them_first
* 1^1 ^\/To:(_dot_)*(_at_)pop\(_dot_)newsguy\(_dot_)com
* 1^1 ^\/Received:.*\/(\.tw |\.kr|[^0-9.]202\.|[^0-9.]211\.|\
[^0-9.]6[1-5]\.|bogota\.supernet\.com\.co)
spam_suspect2.in
(note that in your recipe, you're not LOCKING, and you're also not escaping
the dots in your mail host)
With the scoring, I'm busting your combined expression out into separate
ones - that allows you to see the individual lines which would have matched
when you have verbose logging on (perhaps BOTH the To: and a Received:
would have matched - but with a combined expression, you wouldn't
know). With the scoring number, you'll also have a feel for how many times
an individual expression was matched - and EACH one will be emitted in the
verbose log, although only ONE will remain in $MATCH to be used by your
script. Also, unless you tack a '.*' onto the ends of those lines, the
expression will terminate at the part of the line which actually matches -
it will not show the FULL line to the end. I think it's more useful this
way, since it gives you more feedback about which subexpression (all those
ORs) actually matched.
I expect this is the glimmer in the darkness which you were looking for.
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail