procmail
[Top] [All Lists]

Re: Trouble with procmail v3.13.1: dot file locking

1999-05-18 09:31:06
Hello Bennet and the others,

"BT" == Bennett Todd <bet(_at_)newritz(_dot_)mordor(_dot_)net> writes:

BT> 1999-05-18-12:37:52 Gjermund Sørseth:
Consider how procmail dot-locks a mailbox - first it creates a file
with a unique name, something like /var/mail/_QVETS. Then it tries to
hardlink /var/mail/user.lock to this file. If the user.lock file already
exists (because the user is already receiving some mail), then procmail
unlinks the temporary file, sleeps for 8 seconds and tries again.

After running a number of trusses on both sendmail and procmail to see
what was going on, and now I see that your suggestions are all
straight on track!  I was trying to debug the slowness of our mail
system, and there were 6 procmails awaiting a particular user.  I
intercepted 3 of these with truss (which monitors Solaris system
calls), and found 200 to 300 failed attempts to

link("/var/mail/_auDQ.irit", "/var/mail/user.lock")

This implies an equivalent number of opens on /var/mail/_auDQ.irit as
well as unlinks of the same file.

BT> And the remaining problem is, if you have thousands of procmails stacking up
BT> like airplains outside Newark, the directory will grow large with dotlock
BT> files, and large directories introduce hideous delays under many OSes. Aside
BT> from trying not to create the dotlock file until you've gotten a kernel lock
BT> (if available) the only other fix has to lie outside of procmail, namely fix
BT> the system to use a better-scaling filesystem, one whose performance doesn't
BT> degrade so viciously as the number of directory entries grows large.
BT> Reiserfs[1] claims to be one such, though I haven't tried it out. NetApps'
BT> WAFL is another, and SGI has a third, and that's all the alternatives I've
BT> heard about.


BT> One more thought about leaving that dotlock file around while waiting to
BT> acquire; it'd be a kindness to install a signal catcher for some common
BT> possibilities like INTR, HUP, and TERM, to clean up the scratch file before
BT> exiting.

I haven't looked tat the source code for this, but when I killed a
procmail the dot.lock file did disappear for a split second, before
some other procmail would create it.

There is still the problem of:

        fcntl(8, F_SETLKW, 0x00032FB0)  (sleeping...)

ALL my suspended procmail's are in this state, if they are not
awaiting the link of the dot.lock file.  So there still seems to be
another race condition.

        --Ralph

Dr. Ralph P. Sobek                Disclaimer: The above ruminations are my own.
Ralph(_dot_)Sobek(_at_)irit(_dot_)fr                       Addresses are 
ordered by importance.
sobek(_at_)irit(_dot_)fr                                                If all 
else fails, try:
newsmaster(_at_)irit(_dot_)fr, postmaster(_at_)irit(_dot_)fr             
sobek(_at_)diva(_dot_)eecs(_dot_)berkeley(_dot_)edu
Ph:(+33)[0]561558618  FAX:(+33)[0]561556258  http://www.irit.fr/~Ralph.Sobek/
===============================================================================
Urgent!! Greenhouse Effect: http://www.irit.fr/~Ralph.Sobek/greenhouse.html