procmail
[Top] [All Lists]

Re: A tool for refining regex

2002-02-04 20:25:02
PSE-L(_at_)mail(_dot_)professional(_dot_)org (Professional Software 
Engineering) writes:

Greenlisting?

[...]
Thanks to both of you for that explanation..

Sorry about my bad manners...  But you might get a little laugh out of
the reason for my delay in response.. at least the biggest one.

Working on what Sean calls `the Sandbox' I edited a little script that
cleans up behind a test run.. deletes the created spools logs etc.

Has a few variables ... one of which I edited in a hurry, eager to try
Seans suggestions.  It sets a few variable a couple of which look
like:
   [...]
   testbase=/home/reader/test
   maildir=spool
   TESTDIR=$testbase/$spool

  [...]
   And one  command from the command section
   rm -rf $TEST_DIR/*

Well this script had all kinds of edits and comments and accumulted
junk from being used for different things.  It just kind of growed
there... no other reason for being so dopey.

Some of you will have noted the misspelled variable in the command.
Now considering this was in the wee hours and a weary typist at the
helm.

Consider what happened when I ran that clean up..... I sat there for a
moment wondering what all the permission denied messages were about.
Before coming to my senses and realizing it was chewing its way down
thru the directory tree from `/' and being denied everthing until it
started hitting stuff owned by its dumbell author.....

   What the shell saw was rm -rf /*

Before I killed it, it had swept many files away including its own
script.

Well, not being in a critical commercial serious operation I only had
a sort of home grown backup system.... Still it was able to recover
nearly all my $HOME .. (not all of it got wasted).  But required quite
a bit of fanagling to get back to normall.

[...]

Perhaps then you might archive off the logs?  Nightly incremental
gzipping or somesuch...

Yeah, something like that I have a script in `logrotate.conf' that
checks for certain size.  When reached or surpassed it it gziped and
rotated ... I keep 10 rotations of files that before gzipping are
close to 1.5 mb.

[...]

I call it a sandbox.  I redefine some things such as $SENDMAIL too, so
as to take the bite out of "!" and message creation functions.

I had never got into needing to redifine sendmail.. my usage as you've
seen is pretty primative.

That last part is where the rub is.  I want to let procmail do most of
it by showing what was hit...exactly.  I will then be able to set the
regex accordingly or insert a new recipe as needed.

Have you actually TRIED what I suggested yet?

Sorry.. no I hadn't at that time.. I saw what looked like a weird
notation and was sort of thrown off by it.  Not understanding what it
was.

I'd searched for similar notation in the procmail man pages but fell
short of doing them all before giving up... a bad move.

I had grepped like this: 
man procmail|grep '\* *1\^'
Thru procmail, procmarilrc, and procmailex.....
I'd never really given much pause to `procmailsc'... so neglected it.

[...]

Just wait 'till you have two messages arrive at about the same time.
Then you'll have some grief.  LOCK - to you, it's just an extra colon
on the flags line...

I'm not saying its smart or right only that I haven't seen a problem I
recognized to be caused by not locking.  What would such a problem
look like?

Corrupted mailbox.  No fun.

OK, I'm in the dark here to some degree.  I was under the impression
that some kind of locking happens even without the extra `:'.  I
sometimes see a message while running tests... 
          `unable to lock some.file'  Which led me to thing some kind
of lock locking happens by default.
 
And having run thousand of messages into spools that are later scarfed
by gnus, I must have hit the circumstance where the event you describe
has occured.... no?

Concerning the host escaping:  I haven't seen a false hit I tracked to
being cause by that... probably sloppy alright but it seemed much more
important in the host numbers part.

Get in the habit of escaping dots where you expect them to be dots.
Eventually, it'll bite you in the ass if you don't escape a dot on a
broad regexp.

No doubt, good advice..

[...]

I would have thought that would case a whole different action since
then both must match.

Not with scoring.  Unscored tests MUST evaluate true (which is why I
say to move them to the top - if they fail, the scoring won't occur),
but scored ones need only _total_ a positive value.  If you don't use
scoring for its more advanced purposes, it at least allows you to
perform OR expressions easily.


1 is the base value for a match, ^1 is an "exponent" that says for
each ADDITIONAL match, multiply the base match by this.  If you just
wanted ANY match, you'd use ^0, but if we use a nonzero for the
exponent, procmail will continue looking for matches - and as a result
of the \/ match in the regexp, we'll get the matching line emited to
the verbose log for each header that matches...

Not sure I understand this but at least your saying the matched line
will appear in my log and that will be what I wanted.

If I had the line that matched, I think it would be fairly easy to
tell what did it even with all those or things.

I'd suggest you TRY what I recommend, within a sandbox, and see
first-hand how it operates.

OK.. having done that... I don't get the matched line into the log
yet.  Probably something here set wrong but I don't see it so posting
the test setup.  So far I learned nothing more than what I already
knew about what matched the messages but still no actual line content
snagged from the message.  I guess it narrowed it down a little more
to the 62 66 numbers but again, I already knew that at a glance.

Is there some syntactical error or should I be seeing the actual
header line that matched, printed into my log?

I commented out X-Loop to get the original result...

========================================
[...]
        :0 D
        * ! ^Return-Path:.*redhat\.com|owner-
        * ! ^Sender:.*list
        * ! ^List-Id:
#       * ! ^X-Loop:
        * ! ^Delivered-To:(_dot_)*lula(_at_)yahoogroups
        * ! ^From:.*Putnam
        * ! ^Mailing-List:
#[HP 11/30/01 16:29  ] 
        * ! ^To:(_dot_)*reader(_at_)newsguy(_dot_)com
        * ! ^Received:.*smtp10
        * 1^1 ^\/To:(_dot_)*(_at_)pop\(_dot_)newsguy\(_dot_)com
        * 1^1 ^\/Received:.*\/(\.tw |\.kr|[^0-9.]202\.|[^0-9.]211\.|\
          [^0-9.]6[1-6]\.|bogota\.supernet\.com\.co)
       spam_suspect2.in
[...]
========================================

Here is the log it produced:
[...]
procmail: Match on ! "^Return-Path:.*redhat\.com|owner-"
procmail: Match on ! "^Sender:.*list"
procmail: Match on ! "^List-Id:"
procmail: Match on ! "^Delivered-To:(_dot_)*lula(_at_)yahoogroups"
procmail: Match on ! "^From:.*Putnam"
procmail: Match on ! "^Mailing-List:"
procmail: Match on ! "^To:(_dot_)*reader(_at_)newsguy(_dot_)com"
procmail: Match on ! "^Received:.*smtp10"
procmail: Score:       0       0 
"^\/To:(_dot_)*(_at_)pop\(_dot_)newsguy\(_dot_)com"
procmail: Assigning "MATCH="
procmail: Matched "(66."
procmail: Assigning "MATCH="
procmail: Matched "[62."
procmail: Score:       2       2 "^\/Received:.*\/(\.tw 
|\.kr|[^0-9.]202\.|[^0-9.]211\.|[^0-9.]6[1-6]\.|bogota\.supernet\.com\.co)"
procmail: Assigning "LASTFOLDER=spam_suspect2.in"
procmail: Opening "spam_suspect2.in"
procmail: Acquiring kernel-lock
procmail: Notified comsat: 
"reader(_at_)4985:/home/reader/projects/proc/no_bak/spool/spam_suspect2.in"
From 
bounce-debian-user=reader=newsguy(_dot_)com(_at_)lists(_dot_)debian(_dot_)org  
Mon Feb  4 18:06:07 2002
 Subject: Re: moving windows
  Folder: spam_suspect2.in                   2755
procmail: Assigning "EXITCODE=0"
procmail: Executing "formail -XMessage-Id: && date +"%b %d %T%nSTOP""
Message-ID: <20020202170043(_dot_)GB2363(_at_)fishbowl(_dot_)madduck(_dot_)net>
[...]
========================================

And the head of the testrc that produced them:

========================================
## --*-shell-script-*--
#era
SHELL=/bin/sh           # always always 
PATH=/bin:/usr/bin:/usr/local/bin:/sbin:/usr/sbin
SHELL=/bin/sh
MAILDIR=/home/reader/projects/proc/no_bak/spool
LOGFILE=/home/reader/projects/proc/no_bak/.log
ORGMAIL=/home/reader/projects/proc/no_bak/$LOGNAME
DEFAULT=$ORGMAIL
VERBOSE=YES 

#[HP 10/29/01 05:46 Note the different syntax below for sticking date into LOG 
var
# Single quotes around format string and double quotes around the whole thing 
# including a real newline ] 
LOG="`echo -e 'START'`
"
LOG="`date +'%b %d %T %w '`
"
TRAP='formail -XMessage-Id: && date +"%b %d %T%nSTOP"'

[...] 
========================================
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>