procmail
[Top] [All Lists]

Locking problem

1997-11-11 13:09:25
We're seeing a locking problem with mail delivery.  Here's the general
configuration:

- all mail delivery through mail hubs running SunOS 4.1.4, sendmail 8.8.6,
  procmail 3.10

- procmail reference in hub's sendmail.cf

  Mlocal,     P=/usr/local/bin/procmail, F=lsSDFMhPfn, S=10, R=20/40,
        A=procmail -a $h -d $u

- user mailboxes are NFS mounted from fileservers (Auspex) by both mail
  hubs and by user workstations (primarily AIX 3.2.5 and 4.1.5)

The problem:

  At times, we see procmail processes on the mail hubs hang indefinitely.
  Circumstantial evidence is that locking information between the AIX
  clients and the NFS fileservers is lost/corrupted and procmail is
  waiting for a lock it will never get.  It is typically limited to a
  single user mailbox during any given instance of the problem, although
  I've seen more user accounts simultaneously affected.  The problem
  seems most likely to occur if a system reboots uncleanly (e.g., a
  widespread power outage almost always causes this, users doing a
  power cycle on a system may cause it, other causes are not yet known).

  We see this only if the user is reading their mail from an AIX
  client.  This leads me to believe that it is a locking problem between
  AIX and the Auspex fileserver.

The present cure:

  The only reliable cure we've found to date is:

  1) kill all mail processes on the mail hubs
  2) stop the rpc.lockd on the AIX client
  3) copy the user's mailbox to a new file, thus changing the inode
  4) restart the rpc.lockd on the client
  5) restart mail processes on the mail hubs

Fortunately, this only occurs about once a week on average.  However, if
the affected user is receiving a lot of mail, it can generate a high
enough load average on all of the mail hubs to force sendmail to shut
down and mail delivery at the site stops.

Of course, all of the vendors say someone else is to blame.
Effectively, we have to prove the other vendors are clean before they
will take any significant action.  In the meantime, I need some advice.

Does procmail support any type of timeout while waiting for locks on
NFS mounted mailboxes?  If not, would it be feasible to put an alarm
and handler on the locking calls?  If procmail could timeout and
leave the message queued for delivery, it would keep the load average
down and mail delivery should continue for unaffected users.  I can
deal with having the queue scanned for problem messages through my
monitoring programs, presently used to alert us of hung processes.

Additionally, if anyone else has seen this type of problem, I would
appreciate any information you have.

Thanks in advance.

-- 
Keith Pyle
Systems/Network Engineering
Motorola Somerset PowerPC Design Center
keith(_at_)ibmoto(_dot_)com

<Prev in Thread] Current Thread [Next in Thread>
  • Locking problem, Keith Pyle <=