Re: Regexp fails in scoring recipe



Dallman Ross wrote:

I am inclined to believe that some aspect of your
server environment acts differently when you are logged in from
how it acts when you are not.  Maybe in one case procmail runs
under your uid and shell, but otherwise runs under suid root and
root's (postulated to be different) shell?  Do you have a shell
definition line in your .procmailrc?  I recommend "SHELL = /bin/sh".

Yes, I use the following, which has the same effect (at least on Solaris):

SHELL=/usr/bin/sh

In any case, now that I have seen some sample traffic reports andreceived two directly myself upon having subscribed, I found that arecursive SWITCHRC can work for this and give you abbreviated reportsthat show you just what you want and not the cruft you don't want.

Sometimes, I like to look at the other cruft to remind myself why I paymore to live close to work :-)


The HTML highlighting helps me do a rapid scan.

First, though, some precursor stuff.  You had in your recipe,

 LOCATIONS="(dumbarton|(east )?palo alto|stanford|menlo park|\\
           redwood city|mountain view)"

As I mentioned in a previous reply, you could have a problem with
the word breaks.  I see in the actual reports that once in a while
the spacing between words is inconsistent, which only corroborates
my earlier concern.  I believe it would be easy to have, e.g.,
"PALO  ALTO" (with two spaces between) show up instead of what
you were expecting, and you'd miss it.  Also, a line end could
happen in the middle of the phrase.  I recommended using only
one word (and had said that you don't, in any case, need the EAST
for EAST PALO ALTO, since you are accepting the second two words
anyway).  If you don't want potential false hits with "REDWOOD"

or "MOUNTAIN", however, then here's another way. A tab and aspace are found inside of each of the two pairs of square brackets:


WRDBRK    = ($[         ]*|[    ]+)
X         = $WRDBRK
LOCATIONS =
"(Dumbarton|(East${X})?Palo${X}Alto|Stanford|Menlo${X}Park|\\
            Redwood${X}City|Mountain${X}View)"

On Friday, I took some of your advice and arrived at two recipes thatperform the filtering correctly (more on that in a bit). Unfortunately,I didn't get an answer to my original question.


For the locations variable, I now use this:

LOCATIONS="(palo( |^)alto|stanford|menlo( |^)park|\\
         redwood( |^)city|mountain( |^)view|dumbarton)"

I don't remember seeing two consecutive spaces in these reports as in"PALO ALTO", but I'll use your idea above when I get around to it. Thischange can't hurt, but it doesn't explain the inconsistency in behavior(for example, one road work incident was DUMBARTON, which is unaffectedby this change, and yet, it was not matched in production mode).

I think two words are necessary to avoid false hits on say, "RedwoodHighway, San Rafael".

This is the first recipe that works using scoring (at least it works inprocmail 3.15.2):


 :0 B
 *  1^0
 *  1^1 $ (\<)road work(^.*($NSPC).*)?(^.*($NSPC).*)?(^.*($NSPC).*)?\
          .*(\<)$LOCATIONS\>
 * -1^1 $ (\<)$LOCATIONS\>
 /dev/null

where NSPC = "[^ ]" because I don't want an empty line between theroad work line and a line with a location of interest.

The idea of the scoring recipe is that score = 1 + (number of road workevents in locations) - (all events in locations) is positive if and onlyif the number of non-road-work events is zero, then the action isexecuted as I don't want to see this report.

For reports like the one in the original posting with two road workevents in Menlo Park and one road-work event on Dumbarton and nonon-road-work events in locations of interest, this filter works withprocmail 3.15.2. However, it doesn't work on procmail 3.22 because inthe last condition, two occurrences of Dumbarton are counted even thoughthe report has only one occurrence. This is yet more weird behavior,albeit in a different version of procmail.


The second recipe that works doesn't use scoring:

 :0 B
 * $ (\<)((problem|accident|slowdown|stall)(s)?|advisor(y|ies))\
     (^.*($NSPC).*)?(^.*($NSPC).*)?(^.*($NSPC).*)?.*(\<)$LOCATIONS\>
 {
   KEEP=1
 }

 :0 E
 /dev/null

This approach cheats in that it attempts to list all the complementaryevents to road work (i.e. these are the events I want to see as opposedto the ones I don't want to see). What I don't like about this recipe isthat some new classification could appear in the traffic reports (e.g."disaster" or "flood"), and this recipe would delete the report eventhough I would want to see it.

All right, I used the above in my test harness, and it worked fine.
Here is the main recipe I put below that (goes in .procmailrc):

#-------------------------------------------------------------
:0
* ^From: KPIX\(_dot_)Traffic\(_dot_)Router(_at_)kpix\(_dot_)com
* ^Precedence: bulk
* $ B ?? ^\/\[ ()[0-9]:.*$(.+$)*(.*\<)?$LOCATIONS\>.*$(.+$)*.*
{ SWITCHRC = traffic }
#-------------------------------------------------------------

(I added the "Precedence:" check because you are /dev/nulling the
reports that don't have a city of interest in them, and I imagine that
the list administrator might write you some time with an announcement
that you'd otherwise miss.  In my confirmation mail from the list
for signing up, for example, there was no Precedence: header.)

Okay, I'll add the Precedence test.

Now I made a separate rc-file called "traffic".  That gets
run recursively.  It's important to have a breaking occurrence
in a recursive rc; otherwise, it will iterate until your server
goes kablooey, or something.  :-)  I tested this one on two of
the actual 8-a.m. traffic reports from KPIX:

#-------------------------------------------------------------
:0 Dich:
* ! MATCH ?? ^^(.* )?ROAD +WORK$
| echo "$MATCH" >> somefile

:0
* $ B ?? ^$\MATCH(.*$)*\/\[
()[0-9]:.*$(.+$)*(.*\<)?$LOCATIONS\>.*$(.+$)*.*
{ SWITCHRC = $_ }
#-------------------------------------------------------------

(Heh.  Note that there's no scoring.  Not that I have anything against
scoring, but . . . I didn't need it.)  That long condition might wrap
before it gets to the list, so I'll put a version here with a line
break:

* $ B ?? ^$\MATCH(.*$)*\/\[ ()[0-9]:.*$(.+$)*(.*\<)?\
          $LOCATIONS\>.*$(.+$)*.*

I'll keep this one in mind for when I give up on scoring. Thanks foryour help.

Since I don't care to dive into the internals of procmail to find theanswer to my original question, I'll put it on the back burner for now.


Kevin



_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail