procmail
[Top] [All Lists]

Re: mail corruption with dotlock/nfs

2008-03-05 13:22:06
On Mar  2 11:29 Bart Schaefer writes:
The dotlocking scheme is supposed to enforce this order of events:

Process X creates the lock file
Server creates lock file inode
Process Y encounters the lock file and waits for it to be removed
X opens the mailbox
X writes the message to the file
X closes the file
Server flushes the write to disk
X removes the lock file
Server removes the lock file inode
Y creates the lock
(repeat open/write/close/flush/unlock)

The problem is that with an async mount, the server is allowed to
delay creating the lock file inode or to change the order of "flushes
the write" and "removes the inode", either of which can cause Y to
open and write the file before the server flushes X's changes, and
then all bets are off.  This is especially problematic if X and Y are
on different NFS clients, where the state of the file may not even
appear the same when they begin the operation.

I am almost certain you have identified our problem.  Here is what I have
learned since I last wrote.

1. Apparently, a linux client by default mounts an NFS file system async.
That is, it mounts it async without an explicit "async" in the mount
options list.  How else do I explain the observed 10x(!) decrease in
performance when I add "sync" to the options list.  i.e. the throughput
dropped from 69.2 MB/sec to 7.40 MB/sec.  I am shocked by this discovery.
Are there any other ways to interpret it?  I will probably have to sniff
the wire to learn for sure whether we have an async mount.

2. Netapp (our NFS server) apparently does not support asynchronous
writes.  From NetApp's documentation(1):

        NetApp filers only support synchronous mounts. If
        an NFS client is mounted to a filer volume or qtree
        asynchronously, the client may or may not mount the
        filer volume. If the mount is successful, the NFS
        mount to the filer will be synchronous.

and later in the same document:

        Using an asynchronous NFS mount could lead to data
        inconsistency or corruption

Yikes!  This is also shocking.  It suggests the server may silently
ignore the client's request.  Since the client silently requested
async, the user can be silently screwed.

(1) In fairness, the Netapp document (Solution ID: ntapcs6821) is
dated April 2005, and may no longer be accurate.  I have requested
clarification.

As it stands now, my position seems to reduce to: "Which do you want,
performance or integrity?"  In any case, Bart, thank you so much for
pointing me in the right direction.  This has been a real education for
for me.

Fletcher
____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail