At 00:26 2002-06-21 +0200, Eric Smith did say:
I am using egrep -if to test for the presence of email addresses
in a whitelist. Once I get to about 200 entries in my whitelist,
then the process time is about 4 seconds but doubles with
250 entries.
It's a known issue that grep pigs out on memory. You should see what grep
will do to a system when your -f expressionlist file is 3+ MB in size, and
you're using -w on the input. Now, contemplate doing THAT on several
thousand messages a day.
Whitelist could easily grow to thousands of entries.
My blacklist of domains (besides the dnsbl I maintain, which is handled at
the MTA), is 202K+ lines. It takes my megagrep tool about 3 seconds to
scan the whole list (which needs to be loaded into memory and parsed into a
tree) against a message of about any arbitrary size. Megagrep is optimized
to the way I parse headers -- I match word tokens.
I archived the source code off someplace else (just the compiled binary is
on my servers right now), but it's based on some other material of mine,
and is a compiled C++ tool.
Someone suggests look that does binary searches and this looks
like a good solution but does not support regexes.
My AVL tree code doesn't do regexp's (well, I could link a library and
diddle with that, but then the AVL wouldn't gain me much).
What is the best solution here?
One approach might be to maintain two separate greenlist files - one that
is plain non-regexp strings, and (a presumably much smaller) one with
regexps. Don't hand grep anything more than you want it to be parsing.
The alternative approach is to write a specialized tool that does little
more than EXACTLY what you need it to do, and to do that extremely efficiently.
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail