At 22:13 2002-10-10 +0221, Udi Mottelo wrote:
It makes sense, but, I expected that procmail checks the input
against every regx from the recipe in one phase, for example,
if we have recipe:
[snip]
Separate _condition_ lines are separate expressions and are independantly
evaluated. I suggest you try enabling VERBOSE LOGGING and taking a look at
the results of recipes written with scoring vs. composite OR. It might
help you comprehend the differences.
In fact, if you're truely interested in the performance difference, write
yourself two (or more) separate procmail scripts and run them under a
sandbox config:
time formail -s procmail -m testing.rc < hugemailbox
Run this four times on each filter and discard the longest and shortest
time results - cached and noncached results being what they are.
For example:
:0B
* 1^0 the
* 1^0 web
* 1^0 mail
{
LOG="found$NL"
}
real 1m01.730s 1m03.665s 1m02.889s 1m02.995s
user 0m24.350s 0m24.820s 0m24.760s 0m24.470s
sys 0m19.420s 0m19.890s 0m19.520s 0m20.040s
:0B
* 1^1 the
* 1^1 web
* 1^1 mail
{
LOG="found$NL"
}
real 1m01.462s 1m01.289s 1m01.689s 1m01.416s
user 0m24.010s 0m22.580s 0m23.780s 0m23.550s
sys 0m19.170s 0m20.510s 0m19.680s 0m19.610s
:0B
* (the|web|mail)
{
LOG="found$NL"
}
real 1m00.646s 1m01.108s 1m06.578s 1m02.688s
user 0m23.290s 0m23.700s 0m23.620s 0m23.640s
sys 0m19.180s 0m18.870s 0m19.740s 0m19.820s
All of the above include certain overheads associated with the formail
invocation and the sandbox config - but those overheads are consistent
between each configuration. That was processed against a saved mailbox of
spam and crap about 6.1MB in size. Note the nominal difference between
execution times. There'd probably be a more significant difference on a
slower machine, but all told, the difference isn't
Of course, the actual results will vary - if your keywords do or do not
appear in your messages, as well as the size of the messages themselves.
When procmail meets the first "a" it tries to compares against
all the regx start with "a" ("aq1aw3" and "abcd"). Doesn't
work like this in the (abcd|xyz|aq1aw3) case?
The (string|string|string) expression would stop on the FIRST match of any
one of the strings and then not need to resolve for the others. The
MULTI-LINE conditionals however insist on checking EACH one.
There are a variety of string-search algorythms out there. Boyer-Moore is
probably one of the better known ones, which is optimized for longer search
strings. I'm not familiar with the regexp code in procmail, but you can
figure the regexp nature of it is going to "scan" the text several times
during processing.
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail