procmail
[Top] [All Lists]

BOGUS and lockfile wierdness when under load

2004-09-30 12:20:31
I am setting up a new Red Hat Enterprise mail server that delivers to mailboxes that are on an AFS remote file system. The default mail folder is:

/afs/@cell/common/spool/mail/MailDirs/$LOGNAME/$LOGNAME

This is compiled into procmail.

We sent a couple of test messages and received some console errors from afs:

byte-range lock/unlock ignored; make sure that no one else is running this prog.

I went through the docs and discovered that autoconf will run tests on remote directories to determine the suitable locking method. Since I am using a RH source rpm, I started looking at the output of autoconf during the rpmbuild and discovered that RH was hardwiring the locking method in the Makefile to enable fcntl() only.

I was able to modify the build to allow autoconf to run the locking test on the AFS directory as well as the local ones and it added lockf(). [BTW, I have no way to know for sure if lockf() was added due to the test on the remote, but assumed it was] So the output of 'procmail -v' is now:

procmail v3.22 2001/09/10
    Copyright (c) 1990-2001, Stephen R. van den Berg    <srb(_at_)cuci(_dot_)nl>
    Copyright (c) 1997-2001, Philip A. Guenther         
<guenther(_at_)sendmail(_dot_)com>

Submit questions/answers to the procmail-related mailinglist by sending to:
        <procmail-users(_at_)procmail(_dot_)org>

And of course, subscription and information requests for this list to:
        <procmail-users-request(_at_)procmail(_dot_)org>

Locking strategies:     dotlocking, fcntl(), lockf()
Default rcfile:         $HOME/.procmailrc
        It may be writable by your primary group
Your system mailbox:    /afs/@cell/common/spool/mail/MailDirs/root/root


In hopes that the problem was now solved, we began another series of tests using a shell script that fired a series of one line messages to a test account. If we delayed the each message by one second, everything went smoothly and each message was delivered in order to the user's mailbox.

However, when we removed the delay and just let the a series of messages go as fast as the script could loop (simulating high load on the mail server), we began to see some weirdness.

- In the LOGFILE specified in /etc/procmailrc, we started seeing random messages reading:

procmail: couldn't unlock "/afs/@cell/common/spool/mail/MailDirs/nlcb/nlcd.lock"

- On the console, we were also seeing a corresponding number (we believe) of messages from procmail of the type:

Renamed bogus "/afs/@cell/common/spool/mail/MailDirs/nlcb/nlcd.lock" into "/afs/@cell/common/spool/mail/MailDirs/nlcb/BOGUS.nlcd.QAQwBB"

Note that it is renaming the lock file here. (?)

- I am still seeing some of the "afs: byte-range lock/unlock" messages described above but they seem to be occurring much less frequently than the other messages (1 afs message per ~7 or 8 BOGUS renames in one one of our test runs).

- All of the messages seem to be delivered although not in the order they were sent.

I am not concerned that the mail is not delivered in order but I am worried about all the errors and BOGUS renames. Does anyone have anyone have any idea what is going on here and/or how to fix it?

TIA,

--
David R. Steiner                               
david(_dot_)r(_dot_)steiner(_at_)dartmouth(_dot_)edu
UNIX System Manager                            Phone:  603.646.3127
Dartmouth College                              Fax:     603.646.1041

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>