procmail
[Top] [All Lists]

RE: Regexp fails in scoring recipe

2003-05-16 13:40:22
Kevin Wu [mailto:tessar(_at_)bigfoot(_dot_)com] wrote:

Dallman Ross wrote:

that, since we are focusing on a rightward anchor, the line
end, we should develop our regex NOT_AB from the right, not
the left.  Woo-hoo, but that seemed like the key!  And I think
it is.

In the below, $WS is a space and a tab.  $NL is a newline.
I used a header test instead of body, and created a header called
X-AB-Check: for the testing.

--------------------------------------------------
NOT_AB = "(.|[^$WS]*([^b]|[^a]b|[^$WS]ab))"

:0
* $ ^X-AB-Check:((.*\<)?$NOT_AB)?$
{ LOG = "$NL NOT_AB $NL" }

:0 E
{ LOG = "$NL AB $NL" }
--------------------------------------------------


Eliminating the dots for character positioning is good. One 
quibble: the NOT_AB regexp still has an implicit assumption 
about a newline at the right end, but I'm doubtful that the 
assumption can be completely eliminated.

Perhaps not.

I'm satisfied the non-scoring recipe works now after stripping the 
carriage returns. I've gone back to the scoring recipe since 
it's more elegant and easier to maintain [. . . .]

I understand, and I agree it's a good choice.  I can't quite
leave our little mind-puzzle alone, though.  I was about to
file away your email when I took one last look, and I see that
I can make the NOT_AB expression simpler still.  And at the
same time, reduce the assumptions further. For the amusement 
of anybody who's left who still is looking over our
virtual shoulders, I submit this:

--------------------------------------------------
 NOT_AB = "(.?|[^$WS]*([^a].|[^$WS]ab))"
 
 :0
 * $ ^X-AB-Check:(.*\<)?$NOT_AB$
 { LOG = "$NL NOT_AB $NL" }

 :0 E
 { LOG = "$NL AB $NL" }
--------------------------------------------------

Can we make it purer still?  Not that I can see right now.

Dallman


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail