procmail
[Top] [All Lists]

RE: Searchhing for words in external word list?

2008-12-11 14:49:42
Professional Software Engineering wrote Thursday, December 11, 2008 5:18 PM:

At 10:44 2008-12-11 +0000, Dave Wood did say:

* ^Subject:.*${BADWORDS}*

doesn't seem to work. Am I right in thinking that procmail
is treating the | characters as part of the search string
rather than an operator?

Well, you'd want to encapsulate the massive or'd list (else an
or would evaluate to everything on one side of it - the first
token in your list would evaluate with the ^Subject:.* part,
but the rest would not), and lose the trailing asterisk (which
would match on ZERO or more of the preceeding token).  Also,
cram a $ before the regexp so that the variable is expanded
before the evaluation:

* $ ^Subject:.*($BADWORDS)

Those are parenthesis, not braces.  The braces you're using
will cause regexp operators in the BADWORDS variable to be
escaped so that they could be evaluated as literals.  For
example:

         donkey|ape|zebra

         would become:

         donkey\|ape\|zebra

The rest of Sean's advice -- most snipped here -- was fine, but
this part is not correct.  The curly braces do not cause quoting!

To quote a variable, use a backslash.

 % procmail -m DEFAULT=/dev/null 'FOO="foo|bar"' BAR='"${FOO}"' 'LOG="BAR is 
$BAR"' /dev/null < /dev/null 
BAR is foo|bar

 % procmail -m DEFAULT=/dev/null 'FOO="foo|bar"' BAR='"$\FOO"' 'LOG="BAR is 
$BAR"' /dev/null < /dev/null  
BAR is ()foo\|bar


BTW, you need to be prepared for your badwords list to get
screwed up - if you're editing it right at the time it gets
used for an evaluation, what do you figure happens if BADWORDS
is an empty list?

Good point.


I use a different tact for wordlist matching:  the wordlist
file has one word per line, and I basically invoke grep on the
match line.  In actuality, I'm using a specialized program to
accomplish this, but a grep equivalent would be something like:

:0
* ? formail -xSubject: | fgrep -i -f $BADWORDS
spam

We'll want to quote "$BADWORD" in the line that gets evaluated in the
shell, in case there is whitespace or a shell metachar, I think.

The OP's question has been address numerous times in the past, and
I would urge him to go to the searchable list archives and have at it.
I rue the devolution of the list to something akin to the Microsoft
public newsgroups where every day newbies come and ask the very same
questions over and over and over and over and over, and nobody ever
tried actually to refer to the list history, which pretty much
as a given would have covered this material sometime in 15 years ...

I have nothing against new and unsure users' asking questions --
don't get me wrong.  In my best-effort fantasy world, though,
they look for answers, then come here and say, "I read {blah}
in the list archives but am having trouble applying it to my
specific need; here is why..."

Dallman

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail