procmail
[Top] [All Lists]

lockfile on INCLUDERC to throttle?

1999-12-10 22:13:54

I have some uhm, memory intensive, filter processes (fgrep'ing a bunch of headers against a 3MB datafile, which for some reason causes fgrep to immediatley utilize anywhere from 15 to 45 MB of RAM). The server can only handle so many of these processes concurrently without becoming bogged down (14 seems to be the magic number, but that is with the various incarnations of fgrep using various amounts of memory). Once physical memory resources are exhausted and it resorts to using swap extensively, performance plummets like a skydiver without a parachute.

Until just recently, I had a LOCKFILE defined at the top of .procmailrc, and this seemed to work, although because my individual recipes were written to act standalone (specifying locking where it would normally be appropriate), I was seeing extraneous lockfile warnings in the log, which I didn't like.

If I fetchmailed from someplace where there were a lot of messages, or if my connection was down, and I received a flurry of messages when the connection was restored, the global lockfile would throttle the processing to a single message at a time, which would keep my procmail filtering from pigging out on system resources.

Recently, while revising some filters, I thought I'd try limiting the lockfile to just the rules which were pigging out (that is, multiple messages could be processed concurrently -- but only ONE message at any given time could be in the process of running the spam checking portion).

As a recent 64 message fetchmail process confirmed, this didn't work as expected.

I'm thinking of writing my own support utility to do the header greppage (db-ify the domain list (the 3MB file), then have the utility collect a list of domain-looking components, and perform a couple of fast db lookups). I'm not using regexps per se, but fgrep is probably gobbling up memory on the assumption that it may have to parse the file in a complex fashion (dunno, just a stab in the dark).

If I do this rewrite, it'll probably largely resolve the speed and resources issue. However, I'd still like to know how to properly wrap an INCLUDERC in a lockfile.

Version specifics:
        Sendmail 8.10.0Beta6
        Procmail 3.11pre7 (dotlocking, fcntl(), lockf())
        Fetchmail 5.2.0
        GNU Grep 2.0


The old scheme had:

        LOCKFILE=${HOME}/.procmail.global-procmail.lock

at the top of .procmailrc

and:

        LOCKFILE

at the end. the rc would include a number of other rulesets, among them spam and twit filtering, which are the two pigs (spam foremost).


The revised scheme was to comment out the above two lines, and do the following instead (this occurs within a ruleset that determines if the message headers meet some whitelist criteria, and skips the agressive rules if so):


LOCKFILE=$TEMP/spamsrc$LOCKEXT

:0
* ! $? $FORMAIL -ISubject: | $FGREP -i -f $NOSPAMLIST
{
        INCLUDERC=$PMDIR/spam/spam.rc
}

LOCKFILE

(THIS fgrep isn't the one that pigs out though, because the whitelisting file is quite small - representing user addresses, hosts, or mailing lists which might otherwise be clobbered by the spam filtering).


Placing a lockfile on the recipe itself (:$TEMP/spamsrc$LOCKEXT on the flags line) doesn't seem to accomplish anything at all, except to ensure that I'd get more extraneous lockfile messages, wheras this LOCKFILE mechanism SEEMED to be doing something when I originally set it up, but then, perhaps I'm mistaken.

If it is something which would be addressed by a later version of Procmail, I'd gladly upgrade to accomplish this.

Pointers greatly appreciated.

---
 Please DO NOT carbon me on list replies.  I'll get my copy from the list.

 Sean B. Straw / Professional Software Engineering
 Post Box 2395 / San Rafael, CA  94912-2395

<Prev in Thread] Current Thread [Next in Thread>