procmail
[Top] [All Lists]

Re: Score and _AND_

2002-10-10 16:38:19
At 22:13 2002-10-10 +0221, Udi Mottelo wrote:
        It makes sense, but, I expected that procmail checks the input
        against every regx from the recipe in one phase, for example,
        if we have recipe:
[snip]

Separate _condition_ lines are separate expressions and are independantly evaluated. I suggest you try enabling VERBOSE LOGGING and taking a look at the results of recipes written with scoring vs. composite OR. It might help you comprehend the differences.

In fact, if you're truely interested in the performance difference, write yourself two (or more) separate procmail scripts and run them under a sandbox config:

        time formail -s procmail -m testing.rc < hugemailbox

Run this four times on each filter and discard the longest and shortest time results - cached and noncached results being what they are.

For example:

:0B
* 1^0 the
* 1^0 web
* 1^0 mail
{
        LOG="found$NL"
}

real    1m01.730s    1m03.665s    1m02.889s    1m02.995s
user    0m24.350s    0m24.820s    0m24.760s    0m24.470s
sys     0m19.420s    0m19.890s    0m19.520s    0m20.040s


:0B
* 1^1 the
* 1^1 web
* 1^1 mail
{
        LOG="found$NL"
}

real    1m01.462s    1m01.289s    1m01.689s    1m01.416s
user    0m24.010s    0m22.580s    0m23.780s    0m23.550s
sys     0m19.170s    0m20.510s    0m19.680s    0m19.610s


:0B
* (the|web|mail)
{
        LOG="found$NL"
}

real    1m00.646s    1m01.108s    1m06.578s    1m02.688s
user    0m23.290s    0m23.700s    0m23.620s    0m23.640s
sys     0m19.180s    0m18.870s    0m19.740s    0m19.820s



All of the above include certain overheads associated with the formail invocation and the sandbox config - but those overheads are consistent between each configuration. That was processed against a saved mailbox of spam and crap about 6.1MB in size. Note the nominal difference between execution times. There'd probably be a more significant difference on a slower machine, but all told, the difference isn't

Of course, the actual results will vary - if your keywords do or do not appear in your messages, as well as the size of the messages themselves.

        When procmail meets the first "a" it tries to compares against
        all the regx start with "a" ("aq1aw3" and "abcd").  Doesn't
        work like this in the (abcd|xyz|aq1aw3) case?

The (string|string|string) expression would stop on the FIRST match of any one of the strings and then not need to resolve for the others. The MULTI-LINE conditionals however insist on checking EACH one.

There are a variety of string-search algorythms out there. Boyer-Moore is probably one of the better known ones, which is optimized for longer search strings. I'm not familiar with the regexp code in procmail, but you can figure the regexp nature of it is going to "scan" the text several times during processing.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>