procmail
[Top] [All Lists]

Re: mail corruption with dotlock/nfs

2008-03-10 03:21:55

On Fri, 29 Feb 2008, Fletcher Mattox wrote:

Hi,

After many years of flawless procmail performance as our local mail
delivery agent, we are now seeing frequent corruption in our users'
mailboxes.  The corruption seems to always be of the form:

      \000 ... \000From user(_at_)dom(_dot_)ain date

where \000 is a the null byte.  That is, we are seeing a series of nulls
(30 to 3000+) prepended to the message (or perhaps appended from the
previous message).

We deliver to the user's home directory mounted via NFS from a Network
Appliance file server.  Procmail runs on ubuntu (dapper) linux and
is compiled to use only dotlock, no kernel-based file locking.  I have
generated tons of strace data where I measure time stamps of lock creation
and deletion for both procmail, imapd, and mail reading agents which
access the mailbox directly.  It looks perfectly normal.  The file is
locked for 3-7 milliseconds, and potential file locking "collisions"
are always well separated in time--several seconds, when the corruption
happens.  I can find nothing in our environment which changed, even though
the corruption started very suddenly about two weeks ago.  Like I say,
we have been running this configuration for years, and on a medium size
server (1500 users, 40,000 messages/day), without any trouble whatsoever.

Oh yeah.  One possible hint: some users report exactly the same type of
corruption in their personal procmail logfiles, i.e.

      \000 ... \000From user(_at_)dom(_dot_)ain date
         Subject: whatever
         Folder: /u/fletcher/mailbox                  1234

Does ring a bell with anyone?  It is driving me crazy!


         Yes, I had this problem.  In my case it wasn't NFS problem
         and after some (more then one)  force!  fsck(8) the problem
         disappeared.  I don't know how to do it in Netapp, but, are
        you sure that it's NFS problem?  Maybe it is a Netapp's disk
        recover problem?

Bye,
  Udi



Failing that, does anyone have suggestions on how to best instrument
procmail to debug this behavior?

Thanks,
Fletcher

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>