Re: debugging \<(xxx|yyy|zzz)\>

Jim Osborn (not the one I used to know) wrote:

The depth of my misunderstanding is such that I really can't see why
the obvious, first form fails; why doesn't \<(xxx|yyy|zzz)\> become:

\<xxx\> or \<yyy\> or \<zzz\>


It does.  But scoring counts NON-OVERLAPPING matches to the regexp.

Does the procmail scan "eat" the ',' once
it's seen "xxx," so it's not available to match ",yyy"?

Ingobay. Remember that in procmail's regexp engine, ^ and $ and \< and\> are not zero-width boundary markers; they must have a character tomatch to, even if it's the putative newline at either end of the searcharea. The only exception is that if a matching string found in the textends with a newline, procmail backs up before the newline beforesearching for the next occurrence: that way, if the regexp is anchoredto newlines at both ends, the same newline can serve both as $ for theline leading into it and as ^ for the line leading out of it. But acomma is not a newline.


If your search area is

MSGID,NOBODY

then the first match to ()\<(MSGID|NOBODY)\> is

<putative newline>MSGID<comma>

and the rest of the search area is

NOBODY<putative newline>

where there are no matches.

In any case, can someone straighten me out on the correct way to specify
a set of bounded words?


* 1^! SPAMSCORE ?? ()\<MSGID\>|\<NOBODY\>



_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail