procmail
[Top] [All Lists]

Re: debugging \<(xxx|yyy|zzz)\>

2004-05-28 14:41:20
Jim Osborn (not the one I used to know) wrote:

The depth of my misunderstanding is such that I really can't see why
the obvious, first form fails; why doesn't \<(xxx|yyy|zzz)\> become:

\<xxx\> or \<yyy\> or \<zzz\>

It does.  But scoring counts NON-OVERLAPPING matches to the regexp.

Does the procmail scan "eat" the ',' once
it's seen "xxx," so it's not available to match ",yyy"?

Ingobay. Remember that in procmail's regexp engine, ^ and $ and \< and \> are not zero-width boundary markers; they must have a character to match to, even if it's the putative newline at either end of the search area. The only exception is that if a matching string found in the text ends with a newline, procmail backs up before the newline before searching for the next occurrence: that way, if the regexp is anchored to newlines at both ends, the same newline can serve both as $ for the line leading into it and as ^ for the line leading out of it. But a comma is not a newline.

If your search area is

MSGID,NOBODY

then the first match to ()\<(MSGID|NOBODY)\> is

<putative newline>MSGID<comma>

and the rest of the search area is

NOBODY<putative newline>

where there are no matches.

In any case, can someone straighten me out on the correct way to specify
a set of bounded words?

* 1^! SPAMSCORE ?? ()\<MSGID\>|\<NOBODY\>



_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>