Re: debugging \<(xxx|yyy|zzz)\>

At 13:01 2004-05-28 -0700, Jim Osborn wrote:

I've tried various regex to limit a match to just the words
I'm looking for.  Using the obvious:

SPAMSCORE = "MSGID,NOBODY"
:0
*  1^1 SPAMSCORE ?? ()\<(MSGID|NOBODY)\>
*  1^1 SPAMSCORE ?? ()\<(MSGID|SUBJ|BODY)\>
{}

I expect a score of 2 on the first condition.

You seem to be expecting that the comma in the middle will be consideredTWICE as a wordbreak - once at the end of one word and again before thebeginning of the other. As a demonstration, add a second comma in themiddle of the string - then you'll see that both keywords are in factevaluated, but the wordbreak must be SEPARATE in each evaluation.

and if it does, why don't each of those match their respective
portions of "xxx,yyy"?  Does the procmail scan "eat" the ',' once
it's seen "xxx," so it's not available to match ",yyy"?


Pretty much.

In any case, can someone straighten me out on the correct way to specify
a set of bounded words?


Change how you bound them?

If _LOOKS_ as if you may be composing a string which you then want to scoreto see how many keywords might be in it. If so, construct the string withbounds around EACH token:


SPAMSCORE = "[MSGID][NOBODY]"

Then, when you go looking, each set of bounds is specific to the token.

This doesn't even require a change to your current regexps.

You owe me, uhm, lessee... One sixpack of MacTarnahans Blackwatch. That'dbe an unopened sixpack. <g>


---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail