Re: Score and _AND

At 18:23 2002-10-10 +0221, Udi Mottelo wrote:

        In the first one procmail scan the message three time.  In the
        second only one.  Is it because of the 1^1 ?

That expression means that it will continue scanning the source until theend (otherwise, how can you know if there were or were not additional hitsto affect the scoring). Any time there is a nonzero after the caret (^),you can figure you're going to scan to the end of the source to satisfy thecondition.

   Does 1^0 make it scan like (A|B|C) ?


No, not at all.

* 1^0 A
* 1^0 B
* 1^0 C

Will attempt *ALL THREE* conditions (regardless of whether the other twomatched or not), and as necessary, will scan the entire source (threetimes) in the attempt to match.


* (A|B|C)

SHOULD scan the body once, and will stop on the first occurrance of any ofthe three texts. I'm not intimate with the internals of the procmailregexp engine - within the regexp processor, it will be manipulating thememory multiple times, but it's a very different way over the abovecondition (and the more unique the conditions are - a string versussingular characters - the more efficient the regexp engine should become).

  I'm used to break _OR_ regx into score
        style to make the recipes more readable, does it wrong? (from the
        performance point of view).


Use maximal scoring:

* 9876543210^0 A
* 9876543210^0 B
* 9876543210^0 C

The actual number for maximal is much less, like 2^32 or thereabouts (as anactual exponent, not as the scoring expression!), but the above number isVERY EASY to remember and is just as effective.

*AS*SOON*AS* there is a match on this scoring, it jumps to the deliveryline, skipping the other scoring conditions. If you use some small number,you're going to have to run through ALL of the conditions.

When you have multiple conditions where say, at least two need to match,you can adjust the score with an initial negative:


:0
* -1^0
* 1^0 word1
* 1^0 word2
* 1^0 word3

Since a score > 0 is a match, by starting at -1, means at least two of theconditions need to match in order to make it a positive (assuming that someconditions don't score higher than 1).


:0
* -1^0
* 1^0 word1
* 1^0 word2
* 2^0 word3

This would require word1 & word2, *OR* word3 (with or without word1/word2).

From the efficiency standpoint, this is still scanning the source multipletimes. I choose to apply a certain amount of manual optimization to theprocess and not fret over individual processor cycles. As an example, theabove condition could be written with a maximal:


:0
* -1^0
* 9876543210^0 word3
* 1^0 word1
* 1^0 word2

so that if you hit word3, you've met the conditions, without needing towaste time looking for word1 or word2.

Of course, individual conditions might score as negative as well (say inthe counterbalances process of weighting certain texts as spam, but whenthey actually make reference to something else, they're less likely to bespam).

        Also, Sean explain how important to learn the characteristic
        of the message that we are going to work on before decide the
        algorithm:

This is also the reason you should want to manually optimize by placing theMOST LIKELY condition as the FIRST one in an OR condition, and the LEASTLIKELY FIRST in an AND condition.


---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

Re: Score and _AND_