procmail
[Top] [All Lists]

Procmail hang (beware)

1997-11-04 20:49:44

System:  IBM AIX 4.2.1 running sendmail v8.8.7 and procmail v3.11pre7
         Procmail is installed as the delivery agent specified by Mlocal
         in /etc/sendmail.cf and gets invoked as: procmail -Y -a $h -d $u

Problem:

Yesterday I was called to examine a stuck queue on our sendmail server. 
The machine was unable process and deliver messages to our local users. 
After much investigation, I finally isolated the problem to procmail
"hanging" whenever attempting to write (append) to one particular user's
mailbox. 

As mentioned, mail delivery to other users was also affected.  This
happened whenever a message recipient also happened to belong to the same
alias as the suspected user.  In other words, procmail would deliver a
message to a group of recipients belonging to the same alias just fine
until it reached the address of the suspected user and then it would hang
indefinitely, unable to deliver the message to the "bad" mailbox as well
as all the remaining recipients.  Sendmail could still send and receive
remote connections though.

Ultimately, I was able to get the queue working again by stopping
sendmail, moving the suspected mailbox to another name, and then
restarting sendmail.  I did not examine the user's mailbox for possible
corruption for ethical reasons. 

I'm wondering what could have caused this to happen and how to fix it or
minimize the danger so it doesn't happen again.  Was this a file locking
problem, or could procmail have become hopelessly wedged trying to write
to a corrupt mailbox file?  I may never know the full story since I didn't
think to check to see if any other process may also have been trying to
access the file.  Regardless, it's bad when one user's mailbox can cause a
mail delivery system to fail for everyone else. There should be a way for
the process to timeout or otherwise fail gracefully.

I read about the -t flag which causes procmail to place the message back
in the queue if it can't deliver it for some reason.  I'm not sure if this
would help since in this case it would simply fail again over and over
until the underlying condition was fixed.  This is also providing 
procmail could detect this condition and re-spool it in the first place. 
The man page is not clear on how this behaves.

My questions are; has anyone else experienced this or something similar to
this before?  I'd be very grateful for any suggestions or ideas on how to
fix or prevent this situation from happening again. 

Some background information:

The mailboxes on our server reside on a disk and jfs filesystem that's
locally attached and mounted.  This same mail spool is NFS-exported so
that user's can access their mailboxes from remote NFS clients.  I know
this can invite potential locking problems, but it's also a relatively
common approach for providing a centralized mail spool area.

Also, I've done my best to verify that sendmail and procmail are installed
with the correct ownership and file permissions.  Same thing with mail
spool dir and NFS export permissions (clients mount hard and intr).  I
can't think of anything else to check.

Once again thanks very much for your time.

-Gary

--
Gary B.
garyb(_at_)mhpcc(_dot_)edu

<Prev in Thread] Current Thread [Next in Thread>