Kevin Wu [mailto:tessar(_at_)bigfoot(_dot_)com] wrote:
Dallman Ross wrote:
I still don't like padding $NOT_AB. [. . .]
DeMorgan's Law: !(X == 'a' && Y == 'b') equals (X != 'a' || Y != 'b')
Yup.
[good background stuff deleted]
When we implement this logic with a regexp, we need to
consider strings of length other that two between a word
boundary character on the left and a newline on the right.
So even NOT_AB = "(.|[^a].|a[^b])" is not good enough.
The right answer is
NOT_AB = "(.?|[^a].|a[^b]|(.*)(${NWB}..|${WB}[^a].|${WB}a[^b]))"
where $NWB is not a word boundary character and $WB is a word
boundary character.
This covers all the cases: strings of length 0, 1, 2, 3, ..., between
word boundary and newline. After all, we are trying to match
all strings that are not "ab", and all strings of length other than
2 match that description. I think this is exhaustive. It already looks
ugly for two characters. Perhaps there is some way to combine the
regexps
for length 2 and lengths 3, 4, ..., but that's beyond me right now.
That's a very good recital of the problem and a decent proposal,
imho. However, I *still* think it would work just as well (if not
better) this way:
NOT_AB = "(.?|[^a]|a[^b])"
and so on. Although I thought I understood the idiosyncrasies of
what you stated by way of explanation (elided here), I confess I
still don't get the reasons for two chars after your $NWB above.
I had already tried, in my days of playing before finally answering
again last week,
^.*\<?($NOT_AB)*$
.....^.........^
\ \
and I want especially to point out the `?' and `*'.
I was disappointed that I couldn't get that to work, even where
$NOT_AB is simply "([^a]|a[^b])", though I also tried with the
leading ".?|" thing. In any event, my thought is that $NOT_AB
should stay a clean definition, and the regex can be built around
it to accommodate length of 0-infinity ${NWB} chars.
Here's the full solution:
WSPC = " " # whitespace: space + tab
SPC = "[$WSPC]" # Regexp: space + tab
NSPC = "[^$WSPC]" # negation
X = "($[ ]*|[ ]+)" # Optional word break
Good, so far . . .
LOCATIONS = "(Palo${X}Alto|Stanford|Menlo${X}Park|\\
Redwood${X}City|Mountain${X}View|Dumbarton)"
NEL = "(.*($NSPC).*\$)" # Non-empty line
NOT_WORK4 = "([^w]...|w[^o]..|wo[^r].|wor[^k])"
NOT_ROAD_WORK9 = "(\\
[^r]........|r[^o].......|ro[^a]......|roa[^d].....|\\
road${NSPC}....|road${SPC}[^w]...|road${SPC}w[^o]..|\\
road${SPC}wo[^r].|road${SPC}wor[^k])"
NOT_ROAD_WORK10 = "(\\
${NSPC}.........|${SPC}[^r]........|${SPC}r[^o].......|\\
${SPC}ro[^a]......|${SPC}roa[^d].....|\\
${SPC}road${NSPC}....|${SPC}road${SPC}[^w]...|\\
${SPC}road${SPC}w[^o]..|${SPC}road${SPC}wo[^r].|\\
${SPC}road${SPC}wor[^k])"
NOT_ROAD_WORK = "(.?.?.?|$NOT_WORK4|.?.?.?.....|$NOT_ROAD_WORK9|\\
(.*)$NOT_ROAD_WORK10)"
:0
* ^From:.*(\<)KPIX\.Traffic\.Router
* Precedence: bulk
{
:0
* $ B ?? (\$\$)\[ ?[0-2]?[0-9]:(.*)(\<)($NOT_ROAD_WORK)(\$)\
($NEL)*((.*)(\<))?$LOCATIONS\>
{
KEEP_IT = 1
}
:0 E
/dev/null
}
I started using this solution on Friday. As of today, it has
worked for eight traffic reports in production mode.
Cool beans. I hope you have a decent-size LINEBUF setting. :)
Fortunately, this whole exchange has been enlightening, and
it yielded a non-scoring solution, which I previously thought
was too difficult to even consider. I'll use the non-scoring
solution for a while to see if any bugs pop up. But I'll revert
to scoring eventually, as long as it scoring works in some form
on my mail server.
Thanks, Dallman.
I've enjoyed the exchange too, at least when I wasn't cursing it. :-)
In a more recent post, Kevin added:
I found the answers to my questions:
Q1. Why did my original recipe stop working?
Q2. Why does the recipe fail in production mode while
succeeding in
test mode?
The reason became evident while I was testing the new
non-scoring recipe in production mode: It was also failing when
the traffic report had only road work events in the locations
of interest. In other words, the new recipe was matching road
work events when the regexp was designed to match everything
except road work events. To debug this, I used the \/
token to determine the matching text and put it into the log
Good trick (have done it myself). :)
file. This is what I found:
A1: The traffic report body was in DOS format!
Okay, that's great that you found that; but why do the recipes
work for me with mail from the kpix traffic list? I did test
it for a few days, after all. And I also fired up vi (well,
vim) more than a few times on the traffic reports themselves,
and I never say ^Ms!
Dallman
"If you find a path with no obstacles, it probably does not lead to
anywhere."
Thoughts of Rev. Sunnan Kubose, from _Zen in the Markets_
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail