procmail
[Top] [All Lists]

Re: debugging \<(xxx|yyy|zzz)\>

2004-05-29 16:35:53
On Sat, May 29, 2004 at  8:48:25AM -0500, David wrote:
if the question is how to do it in a single condition so as to need
to scan the variable only once, there's no solution I know that
won't get tripped up when two trigger words are separated by only
one space or punctuation mark.  Well, you could do this,

SPAMSCORE=`echo $SPAMSCORE | sed 's/[^a-zA-Z0-9]/&&/g'`

and then use

* 1^1 SPAMSCORE ?? ()\<(wild|teens?|semprini)\>

but it calls a shell and sed, so it's probably less inefficient
than the extra scans.  If you were searching the entire body rather
than one variable, though, the shell and sed would probably be the
better way to go. 

Ah, that's one I hadn't thought of.  But for a full-body scan, wouldn't
there be limits on how much stuff could get assigned to the variable?
One would hope the extraction that produced the variable would be
reasonably bounded, but I'd worry a bit about pathological cases.

Just to pick a nit, David, don't we need an underscore in that sed class?


Dallman commented,

You want, I think, "wild" plus one or more whitespace OR
newline OR whitespace+newline, plus "teen"; with the two
words bounded by procmail word delimiters.  Right?

No, Jim wants to count the total appearances of delineated spammish 
words in the text.

Right.  The question first appeared with the self-generated variable
I used in my original post, which is easily dealt with.  But the
general question pertains, as David notes, to scans of incoming text.
I'm reorganizing my spam filters, and am finding it useful to group
various sets of words and phrases into variables, both for maintenance,
and so that they can be used in several places in the filtering process.
That precludes the score-each-word-on-a-separate-line solution, but
the sed filter is interesting.  In the real world, it may not be worth
the trouble, on the theory that the overlap we're discussing may not
occur often enough.  But it's nice to know there is a solution.


P.S.  You're not the Jim Osborn I knew in Heidelberg in the late-
middle ninetees, right?

No; I'd love to get to Heidelberg some day.  Bet they have good beer!

He's not the Jim Osborn I knew in Chicago in the early-middle 1980s.

I'm not even the Jim Osborn who also lives on this island in the
Pacific Northwest, USA.  I get some strange messages on my voicemail... :)

Thanks for all the info, gentlemen.

Jim

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>