Re: Score and _AND

At 22:13 2002-10-10 +0221, Udi Mottelo wrote:

        It makes sense, but, I expected that procmail checks the input
        against every regx from the recipe in one phase, for example,
        if we have recipe:

[snip]

Separate _condition_ lines are separate expressions and are independantlyevaluated. I suggest you try enabling VERBOSE LOGGING and taking a look atthe results of recipes written with scoring vs. composite OR. It mighthelp you comprehend the differences.

In fact, if you're truely interested in the performance difference, writeyourself two (or more) separate procmail scripts and run them under asandbox config:


        time formail -s procmail -m testing.rc < hugemailbox

Run this four times on each filter and discard the longest and shortesttime results - cached and noncached results being what they are.


For example:

:0B
* 1^0 the
* 1^0 web
* 1^0 mail
{
        LOG="found$NL"
}

real    1m01.730s    1m03.665s    1m02.889s    1m02.995s
user    0m24.350s    0m24.820s    0m24.760s    0m24.470s
sys     0m19.420s    0m19.890s    0m19.520s    0m20.040s


:0B
* 1^1 the
* 1^1 web
* 1^1 mail
{
        LOG="found$NL"
}

real    1m01.462s    1m01.289s    1m01.689s    1m01.416s
user    0m24.010s    0m22.580s    0m23.780s    0m23.550s
sys     0m19.170s    0m20.510s    0m19.680s    0m19.610s


:0B
* (the|web|mail)
{
        LOG="found$NL"
}

real    1m00.646s    1m01.108s    1m06.578s    1m02.688s
user    0m23.290s    0m23.700s    0m23.620s    0m23.640s
sys     0m19.180s    0m18.870s    0m19.740s    0m19.820s

All of the above include certain overheads associated with the formailinvocation and the sandbox config - but those overheads are consistentbetween each configuration. That was processed against a saved mailbox ofspam and crap about 6.1MB in size. Note the nominal difference betweenexecution times. There'd probably be a more significant difference on aslower machine, but all told, the difference isn't

Of course, the actual results will vary - if your keywords do or do notappear in your messages, as well as the size of the messages themselves.

        When procmail meets the first "a" it tries to compares against
        all the regx start with "a" ("aq1aw3" and "abcd").  Doesn't
        work like this in the (abcd|xyz|aq1aw3) case?

The (string|string|string) expression would stop on the FIRST match of anyone of the strings and then not need to resolve for the others. TheMULTI-LINE conditionals however insist on checking EACH one.

There are a variety of string-search algorythms out there. Boyer-Moore isprobably one of the better known ones, which is optimized for longer searchstrings. I'm not familiar with the regexp code in procmail, but you canfigure the regexp nature of it is going to "scan" the text several timesduring processing.


---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

Re: Score and _AND_