Dallman Ross wrote:
Kevin Wu [mailto:tessar(_at_)bigfoot(_dot_)com] wrote:
Dallman Ross wrote:
Kevin Wu [mailto:tessar(_at_)bigfoot(_dot_)com] wrote:
Dallman Ross wrote:
All right, here is a way around that. We define "not road work"
and use it. Here it is. If you plug it in to your recipe, it
should work just fine.
SPACE = " "
TAB = " "
WS = "$SPACE$TAB"
NOT_RW = "[^R]|R[^o]|Ro[^a]|Roa[^d]|Road[^$WS]"
NOT_RW = "$NOT_RW|Road[$WS][^W]|Road[$WS]W[^o]"
NOT_RW = "($NOT_RW|Road[$WS]Wo[^r]|Road[$WS]Wor[^k])"
I think that idea needs a tweak assuming some word anchors
around NOT_RW:
In 2D: NOT_AB = "[^a].|a[^b]"
In 3D: NOT_ABC = "[^a]..|a[^b].|ab[^c]"
In 4D: NOT_ABCD = "[^a]...|a[^b]..|ab[^c].|abc[^d]"
and so on. I'll use this idea in a non-scoring recipe.
No, I don't see it that way. For NOT_AB, we don't care if
there is a second char at all if the first is not A. Why
parse for the second char? It just uses up cycles.
Here, we see that it's not A, and we stop.
As for anchors, I realize that "road work" is not to be
confused with, "she was driving and overbroad working rig
along I-80"..............................^^^^^^^^^.
But I purposely didn't code word boundaries in, because
that does not, imho, belong in the definition of "NOT_whatever";
but rather in the surrounding recipe's code.
For example, with NOT_AB defined as "([^a]|a[^b])", if we
know it's two letters and want to code it that way, we could
code
* ()\<$NOT_AB\>
and that's that. If you'll notice on my search for ROAD WORK
in previous conditions I coded, I always put a $ at the end
of WORK, because, without exception, every entry I see in those
traffic reports happens that way. One day they could slip
up and put a space or a tab thereafter, but then I'll get
a false positive and see a report that I otherwise might
not have -- not a huge detriment to the trade-off of a clean,
known word boundary.
If there's some specific reason to have a char count, then,
sure, go with "([^a].|a[^b])".
It appears that we are both wrong in at least one case. Suppose we use
this recipe:
:0
* $ ()\<$NOT_AB\$
DID_NOT_FIND_AB
:0 E
DID_FIND_AB
and the sample text is
xyz a
If we use your definition
NOT_AB = "([^a]|a[^b])"
then \<[^a]$ is false and \<a[^b]$ is false so the condition is false
and the mail is delivered to DID_FIND_AB, which is wrong because we did
not find AB.
If we use my definition (with parentheses added)
NOT_AB = "([^a].|a[^b])"
then \<[^a].$ is false and \<a[^b]$ is false (as before) so the
condition is false and the mail is delivered to DID_FIND_AB, which is
wrong because we did not find AB. I can fix this case by changing the
definition to
NOT_AB = "(.|[^a].|a[^b])"
If we change the sample text to
xyz bc
your definition still delivers to DID_FIND_AB (wrong) while mine
delivers to DID_NOT_FIND_AB (right).
Kevin
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail