Re: Log help - sendmail failure?

lists(_at_)professional(_dot_)org writes:

I had some unexplained sendmail failures, and I'm hoping that someone can
shed some light on the following excerpt from my log file:

----------

From useraddress  Fri Jul 18 14:12:20 1997

Subject: original subject
 Folder: /usr/sbin/sendmail -oem -odi -oi -flistname(_at_)domain         1793
procmail: Timeout, terminating "/usr/sbin/sendmail"
procmail: Program failure (-256) of "/usr/sbin/sendmail"
procmail: Kernel-lock failed
procmail: Kernel-unlock failed
--------

elsewhere I get:

procmail: Skipped "/home/listproc/temp/DAEMON.lock"

Note that logging wasn't verbose (famous last words - I hadn't had problems
up till now, so figured I could conserve disk space).  The problem has come
and gone, so it isn't as if enabling verbose now will allow me to catch
more info on this specific incident.

Several questions:
      1. Why would sendmail timeout?  Now, it seems that the messages are
         successfully queued (they have since appeared in remote mailboxes,
         with the exception of one message that appeared in the system
         messagelog as being deferred), but these nastygrams are really
         concerning.


One of the arguments you pass to sendmail is "-odi".  This tells
sendmail to go into interactive delivery mode, and attempt delivery on
all the recipients right then and there.  Depending on the timeouts in
your sendmail.cf and how many recipients are involved, this can take a
*long* time.  Since you've told sendmail to handle bounces by mailing
them back to listname(_at_)domain (the -oem and -flistname(_at_)domain 
arguments),
why don't you just have it process the list in the background?  It'll
go out in the same time, but the parent sendmail will return once the
message has been accepted -- much sooner.  Change the "-odi" to "-odb".

      2. Kernel-lock failed - why, where?


Since you don't include the relevant recipe, it's hard to say.  Kernel
locking is only attempted on real mailfolders, so it's not the sendmail
recipe that's generating those messages.

      3. If the kernel-lock failed, doesn't it stand to reason that an attempt
         to unlock will ALSO fail?  Why is the unlock even attempted when the
         lock fails (or is this "just in case" the lock failed because of an
         existing lock - which I forsee possible problems with too)?


You trust your kernel's return codes???  Just kidding.  I don't know
why Stephen wrote it that way.  As for unlocking someone else's lock,
you can't do that with kernel locks (well, that's not totally true, but
it's true enough for this situation.  I can say that if you figure out
a way to get procmail to do that, you deserve everything you get.).

If nothing else, the duplicate message should draw your eye more, hopefully
leading you examine the situation.

      4. The DAEMON.lock is an oddity unrelated to the current sendmail
         failure - it doesn't appear to be processed in sync with the
        individual messages (If I get a slew of messages, I may see several
        procmail reports with header info, THEN have a stream of five or six
        (or whatever number of messages) of these DAEMON.locks.  The daemon
        lock is a gzip for mailer daemon messages (so they don't forward).
        I'd like to get this stuff to show up in sync - how can I do that?


The "Skipped" message can occur for four reasons:

1)      If in the action line you specify multiple mailfolders, and
        they aren't all directory style mailfolders (one of them is a
        simple file).
2)      You specified an invalid recipe flag.
3)      You put something after the locallockfile name at the beginning
        of a recipe.
4)      You had a line that didn't look like a recipe, and didn't look
        like a variable assignment (the latter because it contained
        illegal characters for a variable name.

Because of the name, I would guess (3).  Can you quote that recipe for
us?


...

What exactly is the 1793 at the end of the folder line?  I'd initially
thought it might have been a PID, but it doesn't match up to anything in my
syslog or messages file.  Apparently this is a message size?  How can I go
about tracking down the actual process results that were running at the
time?  The timestamp on the from in the log allows me to match it against
the logging of the _incoming_ message, but little else.


Yes, it is the message size.  If you want to track things, you could always
turn on verbose logging...

Also, in the message log, I see only one message that timed out while
communicating with a remote SMTP server (and was deferred) -- all other
sendmail transactions there show messages as accepted for delivery.  Yet, I
got these items in the log for about 18 separate messages.


Just because it didn't time out didn't mean it didn't take a very long time.


Philip Guenther