Dallman Ross wrote:
Kevin Wu [mailto:tessar(_at_)bigfoot(_dot_)com] wrote:
So even NOT_AB = "(.|[^a].|a[^b])" is not good enough.
The right answer is
NOT_AB = "(.?|[^a].|a[^b]|(.*)(${NWB}..|${WB}[^a].|${WB}a[^b]))"
where $NWB is not a word boundary character and $WB is a word
boundary character.
That's a very good recital of the problem and a decent proposal,
imho. However, I *still* think it would work just as well (if not
better) this way:
NOT_AB = "(.?|[^a]|a[^b])"
and so on. Although I thought I understood the idiosyncrasies of
what you stated by way of explanation (elided here), I confess I
still don't get the reasons for two chars after your $NWB above.
By way of example, we want to avoid matches on
xyz ab
but we want to match each of the following
xyz tab
xyz crab
xyz Schwab
The regexp "\<(.*)(${NWB}..)$" succeeds on all these examples. It's true
that my definition of NOT_AB has an implicit assumption about the
boundaries around it, and that's not a desirable characteristic. But it
works, and that's a desirable thing. Here's a way to make the definition
more compact via logical algebra:
NOT_AB = "(.?|(.*)${NWB}..|((.*)${WB})?([^a].|a[^b]))"
In any event, my thought is that $NOT_AB
should stay a clean definition, and the regex can be built around
it to accommodate length of 0-infinity ${NWB} chars.
That would be great if it can be done.
In other words, the new recipe was matching road
work events when the regexp was designed to match everything
except road work events. To debug this, I used the \/
token to determine the matching text and put it into the log
Good trick (have done it myself). :)
It works when a regexp matches something when you didn't expect it to
match, but it doesn't work when the regexp fails to match when you
expected it to match. The latter case applied to the original scoring
recipe when it stopped working.
file. This is what I found:
A1: The traffic report body was in DOS format!
Okay, that's great that you found that; but why do the recipes
work for me with mail from the kpix traffic list? I did test
it for a few days, after all. And I also fired up vi (well,
vim) more than a few times on the traffic reports themselves,
and I never say ^Ms!
I don't know, but it may be related to the way the MTAs are configured
on our mail servers. Perhaps your MTA is stripping the CRs whereas mine
is not. None of the mail originating from within my employer's firewall
have CR in the message bodies, and only a handful of externally
generated messages (other than these traffic reports that I get four
times each weekday) have CR in the bodies.
The traffic reports that I get now do not have CR on some lines of the
message body. Only the lines with traffic event content have CR. The
separator lines between traffic events do not have CR. I suspect the
content is generated by some external feed, and KPIX formats and wraps
the traffic content with KPIX-specific content. The external feed may
have switched from a Unix platform to a Windows platform.
Kevin
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail