procmail
[Top] [All Lists]

Re: puzzled about a regexp

2003-01-12 12:04:49
"Nikos K. Kantarakias" <nikant(_at_)freemail(_dot_)gr> wrote:

really that different from:

*

me[  ]*=.*\.(bat("|[  ]*|$)|pif("|[  ]*|$)|vb[as]("|[  ]*|$)|scr("|[  ]*|$)
|lnk("|[  ]*|$)|com("|[  ]*|$)|exe("|[  ]*|$)|{[-0-9a-f]+}("|[  ]*|$))


I mean
* name[     ]* etc..

and yes it really seems that the second way is the correct..I just don't
like the sight of it ;-)

Well, although you state that with a smiley and as an aside, it is actually
an important point: much better not to let a train of almost indeciperhable
scrawlings overpower the design and comprehensibility of what you're doing.

I want to urge people to start compartmentalizing what they're doing,
by way of variables and other tools to render the expressions more
self-documenting and reduce the likelihood of avoidable error.  I don't claim
to be perfect at practicing what I preach.  But I'm trying.

Here's an example from my paltry two virus snaggers.  here's how one looks
in my rc:

 :0  # 030105 () based on original from Philip Guenther, procmail's maintainer
  * $   $GO^0         ^Content-[-a-z0-9_]+:.*=\"?[^\"]*\.$NASTYEXT
  * $ $STOP^0       !  CTYPE ?? ^^multipart
  * $   $GO^0 B ??    ^Content-[-a-z0-9_]+:.*($[$WS].*)*=[$WS]*\
                                           ($[$WS]+)*\"?[^\"]*\.$NASTYEXT
   { RX = "${RX:+$RX, }VIR_01" }


Still looks a little ugly, but I can at least follow the logic of the
algorithm at a glance.  ("$GO" I've defined elsewhere earlier as an
oversaturated "infinity," and "$STOP" as its inversion, an oversaturated
infimum.  The former skips immediately to the action-line; the latter
immediately aborts the recipe.  "$CTYPE" contains what was in the Content-Type:
header.)

"$NASTYEXT" I keep in $HOME/.procmail/vars/spamvars, which gets called as
an INCLUDERC.  It's not nearly as nasty as yours.

 NASTYEXT   = (hta|pif|scr|shs|vb[se]|ws[fh]|(doc|txt|xls)\\.)


Let's go back quite a few incantations of this to something a lot closer
to Phillip's original.  (His original can be found in the list archives,
but I'm not going to look for it right now.  Instead, I'll find the version
from my archived .procmailrc of over a year ago.)

      # added `pif' on 21-Sep-2001
      # added `(doc|txt)\.' on 26-Jul-2001
      # (succeeded on "Homepage" virus 25-May-2001)
  :0  # conditions here came direct from Philip Guenther
  * 9876543210^0 ^Content-[-a-z0-9_]+:.*="?[^"]*\.(vb[se]|ws[fh]|hta|shs|\
                  pif|(doc|txt|xls)\.)
  * 9876543210^0 B ?? ^Content-[-a-z0-9_]+:.*($[        ].*)*=[  ]*\
                       ($[      ]+)*"?[^"]*\.(vb[se]|ws[fh]|hta|shs|pif|\
                       (doc|txt|xls)\.)
  { RECIPE = "${RECIPE:+$RECIPE }VIR_01" }


I don't know about you, but I find the first much easier on the eyes.  (I also
added an "abort" action so that body greps are not necessary if no multipart
Content-Type: header was expressed.  To be fair, I do think that Phillip
did the same in a follow-up to his original, whenever that was a couple or
three years ago.)

As an aside, notice the odd quoting I needed for the quotation marks once
I expressed the shell interpretation with the leading $'s on the condition 
lines.
I mention this because it stymied me briefly when I first cleaned up my syntax.

-- 
dman


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>