Kevin Wu [mailto:tessar(_at_)bigfoot(_dot_)com] wrote:
Dallman Ross wrote:
Kevin Wu [mailto:tessar(_at_)bigfoot(_dot_)com] wrote:
So even NOT_AB = "(.|[^a].|a[^b])" is not good enough.
The right answer is
NOT_AB = "(.?|[^a].|a[^b]|(.*)(${NWB}..|${WB}[^a].|${WB}a[^b]))"
where $NWB is not a word boundary character and $WB is a word
boundary character.
That's a very good recital of the problem and a decent proposal,
imho. However, I *still* think it would work just as well (if not
better) this way:
NOT_AB = "(.?|[^a]|a[^b])"
and so on. Although I thought I understood the idiosyncrasies of
what you stated by way of explanation (elided here), I confess I
still don't get the reasons for two chars after your $NWB above.
By way of example, we want to avoid matches on
xyz ab
but we want to match each of the following
xyz tab
xyz crab
xyz Schwab
The regexp "\<(.*)(${NWB}..)$" succeeds on all these
examples. It's true that my definition of NOT_AB has an
implicit assumption about the boundaries around it, and
that's not a desirable characteristic. But it works,
and that's a desirable thing. Here's a way to make the
definition more compact via logical algebra:
NOT_AB = "(.?|(.*)${NWB}..|((.*)${WB})?([^a].|a[^b]))"
In any event, my thought is that $NOT_AB
should stay a clean definition, and the regex can be built
around it to accommodate length of 0-infinity ${NWB} chars.
That would be great if it can be done.
Don't look now, but I think I may have solved it in a way
that leaves me satisfied.
It occurred to me while trying to sleep (often when I get good
ideas, but the computer has been turned off by then) :-p
that, since we are focusing on a rightward anchor, the line
end, we should develop our regex NOT_AB from the right, not
the left. Woo-hoo, but that seemed like the key! And I think
it is.
In the below, $WS is a space and a tab. $NL is a newline.
I used a header test instead of body, and created a header called
X-AB-Check: for the testing.
--------------------------------------------------
NOT_AB = "(.|[^$WS]*([^b]|[^a]b|[^$WS]ab))"
:0
* $ ^X-AB-Check:((.*\<)?$NOT_AB)?$
{ LOG = "$NL NOT_AB $NL" }
:0 E
{ LOG = "$NL AB $NL" }
--------------------------------------------------
So far, this seems to work on whatever I test it on, from
an empty header to just whitespace to one letter on up,
including when AB directly abuts the colon from the header.
If the header itself is missing, then, yes, we get a false
result, but that seems to be beyond the call of the question.
(Maybe we can even solve that part, though.)
Dallman
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail