Jim Osborn (not the one I used to know) wrote:
The depth of my misunderstanding is such that I really can't see why
the obvious, first form fails; why doesn't \<(xxx|yyy|zzz)\> become:
\<xxx\> or \<yyy\> or \<zzz\>
It does. But scoring counts NON-OVERLAPPING matches to the regexp.
Does the procmail scan "eat" the ',' once
it's seen "xxx," so it's not available to match ",yyy"?
Ingobay. Remember that in procmail's regexp engine, ^ and $ and \< and
\> are not zero-width boundary markers; they must have a character to
match to, even if it's the putative newline at either end of the search
area. The only exception is that if a matching string found in the text
ends with a newline, procmail backs up before the newline before
searching for the next occurrence: that way, if the regexp is anchored
to newlines at both ends, the same newline can serve both as $ for the
line leading into it and as ^ for the line leading out of it. But a
comma is not a newline.
If your search area is
MSGID,NOBODY
then the first match to ()\<(MSGID|NOBODY)\> is
<putative newline>MSGID<comma>
and the rest of the search area is
NOBODY<putative newline>
where there are no matches.
In any case, can someone straighten me out on the correct way to specify
a set of bounded words?
* 1^! SPAMSCORE ?? ()\<MSGID\>|\<NOBODY\>
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail