procmail
[Top] [All Lists]

RE: help with scoring & size condition

2003-01-11 13:30:26
Martin McCarthy wrote:

[Dallman Ross wrote:]

[Matt Garretson <mattg(_at_)assembly(_dot_)state(_dot_)ny(_dot_)us> wrote:]
I'm having a problem with a scoring recipe, in which
I want one of the conditions to hinge on the message
size.

I fell for the same issue a couple of months ago.  It seems
to be an area of procmail that is not at all well-documented

It's actually documented in an example in the scoring page:

    $ man procmailsc

          [ ... ]
              * -100^3   > 2000
              priority_folder

but doesn't work.

Hmm.  Okay, thanks for the pointer to the erroneous man pages.  :)


An alternative to dman's suggestion *might* (depending on just what
you're hoping to do) be to do something like:

  BIGSTUFF
  :0
  * HB ?? > 5000
  { BIGSTUFF=1 }

  :0
  * -1^0
  * 2^0 BIGSTUFF ?? ^^1^^
  {
    LOG="SIZE GREATER THAN 5000${NEWLINE}"
  }


Okay.  That works.  You don't need the "HB ??" thing for
the `>' token, as that's the default.  I know you know that,
Martin, if you'd only thought about it for a microsecond.
I also know you know you'd want that syntax for just the body 
or just the head.  (Often procmail-list messages have just 
humongous headers, over 1K in size, while the bodies are 
sometimes quite short.  Hmm.)

If we're going to set BIGSTUFF to a number, then we can use
scoring syntax in a compact way:

        BIGSTUFF  # clear any existing value
        :0
        * > 5000
        { BIGSTUFF = 2 }

        :0 A  # we really don't want to forget the 'A' flag here
        *           -1^0
        * $  $BIGSTUFF^0
        { LOG = "SIZE > 5000 $NEWLINE" }

Or, if we don't want to depend on remembering the `A' flag (but
still don't want to bother to put it inside the nested curly
braces from the first recipe), we could do the second recipe like so:

        :0
        *                -1^0
        * $  ${BIGSTUFF:-0}^0
        { LOG = "SIZE > 5000 $NEWLINE" }



However, I've now stumbled onto something really big.(!)  This
is pretty cool, I think:  One can leverage the wished-for size
limit off of "infinity" (for which, see `man procmailsc') to
minimize processing impact.  Let me explain further.

It has bothered me that, when doing size comparisons, one had always 
either to (make the effort to) set a preordained upper limit -- 
such as we have done above with the 5000 amount -- above which we 
won't need to process mail with the given recipe; or to ignore impact 
on the system altogether and risk needlessly taxing the server when a 
huge message is encountered that we would rather have ignored.

Well, I have now devised a "better way."  This goes to what
I observed and reported last year about a merely "saturated" score 
("infinity," exactly) versus an "oversaturated" one.  Remember, 
per the procmailsc man pages, that -- 

       [a]s  soon  as  `plus infinity' (2147483647) is reached, any
       subsequent weighted conditions will simply be skipped.


We want to use infinity to skip other scored conditions; essentially, 
to bolt right to the finish line early.  Infinity and its behavior
comprise essentially the only real way to get a subtotal score 
*within* a recipe, before we actually arrive at the action line.  But 
if we don't oversaturate infinity, we might have an unexpected result:

        :0
        *          -1^0  some condition
        *  2147483647^0  another condition
        *          -2^0  odd, but we're still here!  We didn't quite \
                                           "reach" infinity yet
      * -2147483647^0  this recipe will now fail
        { action never reached }

By oversaturating the value used for infinity, e.g., 9876543210,
well, we do skip right to the action on the second condition, above.
That's better, in most instances, in that it's probably what we
expected.

All right, but now to my point.  Let's bounce the size limit we want 
off of the saturation point for the supremum.  Then we can go ahead and
bother to count individual characters[1] without risking taxing
the server more than we'd like or can predict.

Okay, suppose we want to know the exact size of a message if it's 
under 5000 bytes, but otherwise not bother.  Well, we could do 
something like the above, putting in there a condition like this:

        * < 5000

and then doing the count:

        * 1^1  .


But if all we're really interested in is the first 1000 bytes,
we can't stop counting there.  We *can*, however, stop counting
early if we do what I've been building up to.  First let's make 
things easier on ourselves and set a variable:

        SUPREMUM = 2147483647

We'll first subtract the number of bytes we're really interested in
for our counting limit.  That way, when we express the supremum
in the second condition line, we won't skip the rest of the scored
conditions.  We haven't, after all, oversaturated infinity:

 :0
  *                -1000^0
  * $          $SUPREMUM^0
  *    BH  ??          1^1   .
  * $         -$SUPREMUM^0
  { OVER_1000 = yes }

 :0 E
  * $   $=^0
  *   1000^0
  { UNDER_1000_ACTUAL_SIZE = $= }


I don't know about the rest of you, but I find that exciting!
I tested it with my shell's `time' routine, then, again, without
the saturation-bounce; three times each, on a 6MB+ file.
The differences were quite discernible:

Using the "Ross Saturation-Bounce":
 0.240u 0.082s 0:00.43 74.4%     0+0k 0+0io 0pf+0w
 0.181u 0.163s 0:00.46 73.9%     0+0k 0+0io 0pf+0w
 0.135u 0.194s 0:00.41 78.0%     0+0k 0+0io 0pf+0w

Processing the whole damn file regardless:
 1.461u 0.125s 0:01.72 91.8%     0+0k 0+0io 0pf+0w
 1.483u 0.118s 0:01.86 85.4%     0+0k 0+1io 0pf+0w
 1.406u 0.213s 0:01.76 91.4%     0+0k 0+0io 0pf+0w

Well, the above's what I was thinking about at 4 a.m. and what
I had to get up to write down on a paper notepad (the recipe 
syntax, I mean) before I could go back to sleep.  :-)

Ob-Oh: I sure wish the list moderator would regain consciousness
and send the "non-member" submissions he's collected (one of which
is mine from yesterday); and I furthermore wish he'd fix the fubar
procmail-users(_at_)procmail(_dot_)org list alias, which has been broken for,
like, a year!


[1] I believe this "1^1 ." stuff was first proposed by David Tamkin.
It was, in any case, years ago.

-- 
Dallman Ross

"If you find a path with no obstacles, it probably does not lead to
anywhere."
        Thoughts of Rev. Sunnan Kubose, from _Zen in the Markets_ 


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail