procmail
[Top] [All Lists]

Re: "Kernel-unlock failed" - possible mail loss issue?

1997-10-23 22:27:33
nelson(_at_)media(_dot_)mit(_dot_)edu (Nelson Minar) writes:
...
My circumstance is a bit weird. I believe the culprit is that I
frequently run bash code like this:
[ \! -s swarm.mbox ] && rm swarm.mbox; ls -l *.mbox
(The purpose of this code is to delete empty mailboxes. It's not an
atomic check, which is why I think this is all ultimately my fault.)

Yep.  You should change this to:

        if lockfile swarm.mbox.lock; then [ -s swarm.mbox ] ||
        rm swarm.mbox; rm -f swarm.mbox.lock; fi
        ls -l *.mbox

Or if you're feeling obscure:

        for mbox in swarm; do lockfile $mbox.mbox.lock && { test -s
        $mbox.mbox || rm $mbox.mbox; rm -f $mbox.mbox.lock; } done
        ls -l *.mbox


The directory with my email is NFS mounted both on the machine that's
running procmail and the machine that's running that empty mailbox
code. I think there's a race condition happening. My guess is that at
the moment I do the test, the file is in fact empty. But procmail has
it open and writes to it while my bash function is executing, and then
bash removes the file before procmail is done. So procmail then goes
to unlock the file, which now no longer exists, and the kernel unlock
fails.

Exactly.


Again, my question really is can or should procmail do anything to
handle this case? The code in question is in mailfold.c, the function
dump():

       int serrno=errno;                      /* save any error information*/
       if(tofile&&fdunlock())
          nlog("Kernel-unlock failed\n");
       SETerrno(serrno);

It looks like the assignment to serrno is to deliberately *ignore* the
error from fdunlock(). Is that the right thing to do?

I don't think I know enough about the failure modes of kernel-locking
to be able to say what would be safe here.  From what I've seen on the
list it seems that this generally occurs when a) kernel locking is
hosed, in which case the message was delivered (but might be
corrupted); or b) the mailbox was removed from under procmail, in which
case the message was delivered correctly, but *some* other process is
not careful enough and mail is being lost.  What would be "safe"?  Just
writing the message about isn't safe, and neither is truncating and
returning an error.  What's that leave?


While I'm here, if someone knows of a safer way to delete 0 length
mailbox files out from under procmail, I'd love to know. I hate NFS.

See above.


Philip Guenther

<Prev in Thread] Current Thread [Next in Thread>