At 17:10 2004-03-01 -0500, Eric Wood wrote:
For performance reasons, my goal is to *not* have is run grep two times just
to set a variable as in:
:0 HB
* ? grep -o -i -f /etc/vmail/spam_words
BTW, I should point out that as message sizes and wordlists grow, this has
the potential to get ugly. In the interests of performing an encompassing
search, the expression tables within grep can get pretty large. IME, even
moreso if you use the -w argument.
Keep in mind that several words may match WITHIN other legitmate words, or
be applied with completely legitmate meaning. Scunthorpe is a village in
the UK, you can screw in a lightbulb, help the neighbour's teen fix his
dripping cam covers, and you can hear about your grandfather's new pet
black pussycat.
Also, BASE64 encoded attachments have a way of matching some curious words
just because of the encoding.
Hopefully you don't get too broad with the keyword list.
BTW, you might just archive off a copy of your mail and run this as a
standalone filter to test it, say in a sandbox, using formail to split the
mailbox into the individual messages, in which case a one-time spike in CPU
during your test is hardly a big concern, and you don't have to worry about
optimizing the whole process just to see whether the underlying method of
flagging junk is even _viable_.
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail