procmail
[Top] [All Lists]

Re: Not scaling well

2003-09-16 11:59:26
At 09:25 2003-09-15 -0700, Enzo wrote:
Wondering if anyone sees any inefficiencies in this beast.

Yea, (e)grep -f sucks up memory like there's no tomorrow, probably due to the number of evaluation trees that it needs to generate to compare against even a mediocre sized file.

# Test if the email's sender is in user definded whitelist, if so deliver it.
:0
* ? formail -x"From" -x"From:" -x"Sender:" -x"Reply-To:" -x"Return-Path:" -x"To:" | egrep -is -f

If the included file is just a series of plain text keywords - NOT regexps, then try 'fgrep' instead of egrep. You'll find it makes a huge difference in memory and CPU demands.

 /usr/local/apache/htdocs/secure/usermaint/nobounce/${USER}

Might I suggest linking that to some more manageable location?

#Define getting the sender's address, Discard any leading and trailing whitespaces
FROM_=`formail -rt -xTo: \
  | expand | sed -e 's/^[ ]*//g' -e 's/[ ]*$//g'`

OMG.  Why all the cruft?  I see a shell, formail, expand, and sed.

Check out '-z' in the formail manpage (i.e. "formail -rtzxTo:")


#Return certain blacklisted email
:0
* ? formail -x"From" -x"From:" -x"Sender:" -x"Reply-To:" -x"Return-Path:" -x"To:" | egrep -is -f /usr/local/apache/htdocs/secure/usermaint/blacklist/${USER}
# Avoid forgeries that pretend to be from my own site
* ! $ ? echo ${FROM_} | fgrep -is 'boothcreek.com'
* $ ? echo ${FROM_} | fgrep -is '.'
* $ ? echo ${FROM_} | fgrep -is '@'
# Avoid email loops
* ! ^X-Loop: postmaster(_at_)mydomain\(_dot_)com

Check for the loop *FIRST*. Basically, when dealing with AND'ed conditions, ALWAYS check for the things which require the least amount of CPU. When the message fails, it fails before you've thrown a lot of CPU at it. In this case, nearly all of your expressions should be in reverse order.

What's up with all the echo | fgrep -- you're invoking a shell and two programs for each of those, when:

        * ! FROM_ ?? boothcreek\.com
        * FROM_ ?? \.
        * FROM_ ?? @

Would do (though I don't comprehend the reason for the latter two expressions - I guess you're requiring that there be an address separator and a dot, but there are other expressions you could use for that). The method I present saves you THREE shells, three echos, and three fgrep invocations, because the whole thing is performed internally to procmail. Individually, those invocations don't seem like much, but in overall processor time, they add up.

[snip]

I don't presently have the time to evaluate what it is you're trying to accomplish with the lock stuff and temporary files, but let me tell ya, EVERY message that has to be processed by that same recipe is going to hold up ALL your other messages (through that recipe) until that message processes - basically, that email is being handled single file.

---
 Sean B. Straw / Professional Software Engineering

 Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>