procmail
[Top] [All Lists]

Re: Scoring by size ???

2003-09-28 14:58:59
On Sat, Sep 27, 2003 at 11:39:23AM -0700, Bart Schaefer wrote:

On Sat, 27 Sep 2003, Dallman Ross wrote:

The answer is that scoring syntax is not compatible with
the special size tests ( > or < ) that we can alternatively
use.  This is just one of those things one learns about
procmail (and then forgets, and has to learn again).

It's not that it's "incompatible," really, it's just that it works 
differently.

Right.  But you can't put them both together, as he tried to do,
with any hope of success.

Keep in mind that when you mix weighted and unweighted conditions,
all weighted conditions are processed before any unweighted condition
is looked at.

That's not correct.  Conditions are processed in order; you can see this
from verbose log output.  The only difference is that, with the exception
of scores that hit +/- supremum, weighted conditions continue to be
computed even when the result could not change the outcome of the recipe.  
A non-matching unweighted condition mixed in will still stop the recipe
(and the score) at the point where it appears.

Thank you.  I should have known better.  I did know better, but at some
point my memory failed me.  I think I'm getting older.

I was, however, after a slightly different point than what you explained
well above.  To wit, let's take this example recipe:

 :0
  *  200^0   ! ^From:
  *  100^0   ! ^Message-ID:
  *           ^^From ()
  *            ^Date:
  *            ^Received:
  { WE_ARE_HERE }


We know looking that if both the first two conditions fail to match,
there is no point in going on.  Even if the bottom three are true,
the recipe will never get to WE_ARE_HERE.  But procmail doesn't
know that.  (Bart knows it; don't take this as my attempt to teach
him.)

Here, from my test harness, we see that the three unweighted conditions
were tested (look at arrows), even though both the two weighted
conditions declined to match:

 11:18pm [~/Mail] 395[0]> harness SPAMPLE

    [. . . .]

    procmail: [11469] Sun Sep 28 23:22:52 2003
    procmail: Score:       0       0 ! "^From:"
    procmail: Score:       0       0 ! "^Message-ID:"
->  procmail: Match on "^^From ()"
->  procmail: Match on "^Date:"
->  procmail: Match on "^Received:"
    procmail: Assigning "HOST=byebye"
    procmail: HOST mismatched "panix5.panix.com"
    From esteves(_at_)eiffel(_dot_)com  Thu Sep  4 07:28:36 2003
     Subject: Re: new mail                                                      
   
      Folder:                                    1173


What I was trying to convey while misstating things miserably was:
Try to think through your algorithm the way procmail will see it.
(It helps immensely to have a test harness, known also as a sandbox.)
In the sample above, why make procmail look at every single
message to see whether it has a From_, Date:, and Received:
header, notwithstanding the cases where all our earlier
(positively-) scored condition lines failed?  It's just
extra work for the mail server.

There are a number of ways to improve the efficiency of that
recipe.  Perhaps the easiest way is to put the unweighted
conditions before the weighted ones:

 :0
  *           ^^From ()
  *            ^Date:
  *            ^Received:
  *  200^0   ! ^From:
  *  100^0   ! ^Message-ID:
  { WE_ARE_HERE }

Now, if there's no Date: header (or From_ or Received, albeit
damned unlikely), we bail right there.

One can and should also think about what the most likely missed
conditions are, and put them first-of-all.  While most every message
will have a From_ and a Received: header, the likelihood is greater
(from among these three unweighted conditions) that the Date: header
would be missing.  (Not so likely, I know, but more likely than that
there is no Received: header, anyway.)

So I'd try to order the recipe like so:

 :0
  *            ^Date:
  *           ^^From ()
  *            ^Received:
  *  200^0   ! ^From:
  *  100^0   ! ^Message-ID:
  { WE_ARE_HERE }

This does bring up a question, though, the answer to which I don't know:
Since From_ is always above Date:, are we making procmail do extra work
by starting with a test of Date: and then moving "upstream" to From_?

In any event, thank you, Bart, for stopping my too-quickly-scribbled
bad information, and for causing me to state what I'd wanted to
with greater care.  (Hopefully, it's also now mostly correct.)

Dallman

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>