procmail
[Top] [All Lists]

Re: Plaintext base64 error.hta attachment

2003-05-31 15:15:02
On Sat, May 31, 2003 at 01:42:05PM -0700, procmail(_at_)deliberate(_dot_)net 
wrote:

On Sat, 31 May 2003 11:47:13 +0200, Dallman Ross
<dman(_at_)nomotek(_dot_)com> wrote:

      Thanks for posting it.

=>  :0  # 030403 () based loosely on an original from Philip Guenther
=>   * $           $GO^0    ^Content-[^$WS]+:.*=$DQ?[^$DQ]*\.$NASTYEXT
=>   * $         $STOP^0  !  CTYPE ?? ^^multipart

      I like the way it uses the Content header to decide on
whether to do the next condition.  I haven't used negative
scoring like that works, but it makes a lot of sense and makes
nicely readable and efficient code.

Actually, Philip didn't use that part; I added it.  He did follow
up (see the archives) with an improved version to his hasty original,
which improved version had a condition line throwing out messages
with no multipart in the Content-Type:.  But that was after a
number of people propagated the simpler recipe across web sites
that can still be found.  Oh, well.

I use $STOP fairly often, because it can do things that a simple
non-weighted condition can't, on occasion, and because it's less
overhead: the unweighted condition is only parsed after all weighted
ones.  So

        * 1^0      foo
        * ^Content-Type:.*multipart/alt
        * 1^0 B ?? bar

is going to run the body grep anyway, *then* decide it didn't need
to after all when the Content-Type: header isn't multipart/alternative.
The $STOP^0 trick precludes that.  We bail at $STOP if it matches:

        *       1^0         foo
        * $  STOP^0      ! ^Content-Type:.*multipart/alt
        *       1^0 B ??    bar

Note that we require a negation here, where we didn't above, for
the logic to work.  "Stop if it's *not* in the header" is logically
equivalent to "continue if it's in the header"; but we can use the
short-circuit feature of negative-infinity to bail now.



=> * $  B  ??  $GO^0  ^Content-[^$WS]+:.*($[$WS].*)*=[$WS]*($[$WS]+)*$DQ?\
=>                                         [^$DQ]*\.$NASTYEXT

      I'm confused about the ($[$WS]+)* part. How does that
parse?

The first $ is EOL.  "( Newline, followed by some whitespace), occuring
zero or more times."

 
=> $GO is an oversaturated "infinity" of 9876543210.  $STOP is its negative.

      Nice.  I use an $OR varibalbe that combines the scoring
part like this in my variable:
      OR = "2147483647^0"
so that I can write a list of absolutes such as:

:0
* $ $OR sometest here
* $ $OR someothertest here
* $ $OR etc etc etc


That's fine.  Here's a caveat to using the exact "infinity" value, though.
Suppose we have this:

        * $      $OR  foo
        * $     -1^0  matchbar
        * $      $OR  matchfoo
        * $  $STOP^0  matchfoobar
        * $      $OR  nomatch

You'd want to skip to the finish line on "matchfoo" right?  But you'd
fool yourself.  You'd match, but since you hadn't gotten to infinity yet
(we're one short, because we also matched just above on "matchbar"), we
won't short-circuit.  We'll descend through the recipe just as with any
other weighted condition set.  Now we get to "matchfoobar" and bail.
Whoops!  Not what we intended.

That's why I use a (well-) oversaturated value.

There are occasions when I want the exact infinity value, too, but
then I have other math tricks in mind (the "infinity shuffle," as I've
renamed the trick lately.)


-- 
dman

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>
  • Re: Plaintext base64 error.hta attachment, Dallman Ross <=