procmail
[Top] [All Lists]

Re: debugging \<(xxx|yyy|zzz)\>

2004-05-28 14:30:20
At 13:01 2004-05-28 -0700, Jim Osborn wrote:
I've tried various regex to limit a match to just the words
I'm looking for.  Using the obvious:

SPAMSCORE = "MSGID,NOBODY"
:0
*  1^1 SPAMSCORE ?? ()\<(MSGID|NOBODY)\>
*  1^1 SPAMSCORE ?? ()\<(MSGID|SUBJ|BODY)\>
{}

I expect a score of 2 on the first condition.

You seem to be expecting that the comma in the middle will be considered TWICE as a wordbreak - once at the end of one word and again before the beginning of the other. As a demonstration, add a second comma in the middle of the string - then you'll see that both keywords are in fact evaluated, but the wordbreak must be SEPARATE in each evaluation.

and if it does, why don't each of those match their respective
portions of "xxx,yyy"?  Does the procmail scan "eat" the ',' once
it's seen "xxx," so it's not available to match ",yyy"?

Pretty much.

In any case, can someone straighten me out on the correct way to specify
a set of bounded words?

Change how you bound them?

If _LOOKS_ as if you may be composing a string which you then want to score to see how many keywords might be in it. If so, construct the string with bounds around EACH token:

SPAMSCORE = "[MSGID][NOBODY]"

Then, when you go looking, each set of bounds is specific to the token.

This doesn't even require a change to your current regexps.

You owe me, uhm, lessee... One sixpack of MacTarnahans Blackwatch. That'd be an unopened sixpack. <g>

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>