Re: perl script to remail contents of mbox file
2004-06-10 20:37:51
At 14:41 2004-06-10 -0400, Jeff A. Earickson wrote:
be used to recover from procmail rule disasters, or it can be used to
forward already-delivered email to another address. We hope that
procmail sysadmins find this script useful.
I belive you meant to say that you hosed a rule in /etc/procmailrc, not "my
procmail rules" (since redelivery to YOURSELF is an ultra-trivial
matter). You have a problem in that the messages as stored won't contain
any information positively identifying who they were intended for. If you
rely on the To: or a "Received: ... for" header giving you this
information, you're going to be subject to a LOT of false information, and
very likelt pissing off a lot of people (including people on mailing lists
to which you may redeliver messages). Multiple recipients and BCC's in
particular are issues.
If you need to recover from THAT sort of catastrophe, I'd suggest leaning
on the system maillog as a resource for sorting who a message was
originally intended for.
formail -s procmail -m i_fscked_up.rc < thrashedmail.mbx
(noting that thrashedmail.mbx should NOT be the mailbox located in the same
location that your /etc/procmailrc (or any other procmailrc for that
matter) would be delivering mail into, lest you generate yourself an
endless loop.)
The i_fscked_up.rc would:
* take the topmost received header and extract the (E)SMTP message ID from
it. For example, the message to which I am replying arrived at my host
with the following topmost received header:
Received: from ms-dienst.rz.rwth-aachen.de (ms-1.rz.RWTH-Aachen.DE
[134.130.3.130])
by mailhost.domain.tld (8.12.10/8.12.10) with ESMTP id i5AIrbh9015749
for <my_address(_at_)some(_dot_)domain(_dot_)tld>; Thu, 10 Jun 2004
11:53:38 -0700
That "i5AIrbh9015749" bit is of interest to us.
You could use formail to get this header, but you'll still need to process
it further to get the SMTP ID from it, and since the topmost recieved
header should be locally inserted AND contain your local mailhost name, the
following expression should grab it handily:
:0
* ^Received:.*by mailhost\.domain\.tld.* with E?SMTP id \/[a-z0-9]+
{
SMTP_ID=$MATCH
}
Using that ID, the recipe would then grep your maillog file. Optimally,
you might pre-process your maillog to strip it down to smtpids and their
related local recipients (which could be an external process), and thus
finding a list of recipients would be a simple one-line grep operation. We
won't assume that's been done, and instead will simply do all the work
here. Note this is NOT going to be a light CPU load. Then, someone fscked
up, and this shouldn't need to be run more than the one big time, right?
A simple grepage for this ESMPT id would return something like:
Jun 10 11:53:38 trei sm-mta[15749]: i5AIrbh9015749:
from=<procmail-bounces(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE>, size=10158, class=-30,
nrcpts=1, msgid=<Pine(_dot_)GSO(_dot_)4(_dot_)58(_dot_)0406101432170(_dot_)2341(_at_)garnet>, proto=ESMTP,
daemon=MTA, relay=ms-1.rz.RWTH-Aachen.DE [134.130.3.130]
Jun 10 11:53:43 trei sm-mta[15750]: i5AIrbh9015749:
to=<my_address(_at_)some(_dot_)domain(_dot_)tld>, delay=00:00:05, xdelay=00:00:05,
mailer=local, pri=94465, dsn=2.0.0, stat=Sent
Hopefully your MTA uses SMTP ids and logs something worthwhile - if not,
then this approach won't work, and you really should look at switching to a
worthwhile MTA...
That first log line merely comes up in the trivial grep operation - we
don't need it (and the revised grep below will in fact not return it in the
result set), but it's worth noting the nrcpts= token in there -- where that
is >1 is where a conventional use-the-message-itself fixup approach would
have significant problems (besides the other issues with that approach).
The to= bit and the mailer=local bit are the significant bits of the second
log line. mailer=local means they'd be messages which were locally
delivered and thus (assuming procmail is your LDA), subject to your
/etc/procmailrc. Deliveries to file aliases and remote users would be
excluded, since THEIR copies wouldn't have been affected and they won't
show a mailer=local, even though they were received and processed by your
mail host.
So, our grep operation might be:
RECIP_RAW=`grep "$SMTP_ID: to=.* mailer=local" /var/log/maillog`
Thankfully, as only local deliveries should have been affected, you needn't
utilize sendmail to affect re-delivery and thus do not need to recover the
original envelope sender data (available in the From_ line), which is
doable, but just more work.
However, sendmail still works into the equation, as per below.
The above still will not ven this method will not positively identify each
true local recipient specified in the envelope -- where somone (usually a
spammer, but let's say you have some virtual domains where someone sorts
multiple mailboxes from their one mailbox) specifies multiple addresses
which resolve to the same local user, only ONE of those addresses will be
logged to the maillog (and only ONE message should actually be
delivered). This isn't really a problem, since the MTA itself discarded
the additional recipient copies, so you're no worse off than what the MTA
was doing in the first place.
In the event of multiple recipients, a raw grep would find more headers,
but the above grep would still focus on the actual _local_ ones.
Since there is a separate log line for each recipient locally delivered to,
a simple sed operation tacked onto the above greppage will clear the cruft:
| sed -e "s/^\(.*to=<\)\([^>]*\)\(>, .*\)$/\2/"
This nets us raw recipient addresses, one per line (a little more sed
scripting, and they'd be on one line, but this isn't necessary at this stage).
Now, what you have is a list of recipient ADDRESSES. But locally, you want
the userids. Invoke sendmail with these, in "address verification mode",
which will expand the addresses, shown here with the followup scrubber sed
operations:
RECIP_INTER=`sendmail -bv $RECIP_RAW \
| sed -e "s/^\(.*deliverable: mailer local, user \)\(.*\)$/\2/" \
| sed -e :a -e '$!N' -e '$!ba' -e 's-\n-\ -g'`
The resulting variable contains the local usernames to which the message
should be delivered to.
There is one issue, and that's dupes - your procmailrc will have shuttled
multiple copies of messages - one for each local recipient. You should be
able to clear those out using a messageid cache, more or less straight from
the procmailex manpage:
# you fscked up - the bigger the source mailbox, the bigger the cache
# should be in order to ENSURE you're not duplicating message
# deliveries.
:0 Whc: msgid.lock
| formail -D 100000 msgid.cache
:0 a:
duplicates.mbx
There is the remote chance that a message (say, from a mailing list) will
have the same messageid and be destined for multiple users handled at your
host - say because the recipients have different domains and different
backup MX's and your host wasn't immediatley reachable when the list
message was delivered (or someone is using a mail forwarder, etc), and as a
result each of them will have a separate smtpid associated with them. So,
we need to formulate a way to cache based on the smtpid rather (or in
conjunction with) the messageid. This works to eliminate dupes caused by
the redelivery aspect, but also ensures that each recipient will receive
whatever number of copies they would have originally (since we don't
actually weed out the dupes they would have received originally).
Of course, if we're using the SMTPID for dupe checking, most all duplicates
should be near consecutive to one another, so that larger msgid cache issue
is moot: a small one will suffice (plus, our SMTPIDs are MUCH shorter, and
a given filesize will accomodate 3-5 x as many cache entries).
You'd of course do this before wasting your time with the rest of the
recipient identification operations, sans the SMTPID extraction.
Final delivery would be:
procmail -d recipient recpipient recipient
So, let's thread all of that together into one big happy rcfile (look ma,
no perl!). I know the following to work, at least for sendmail as
instlaled on my hosts, since I ran a test of it today after I'd written it:
# BEGIN i_fscked_up.rc
# to reprocess a misdelivered mailbox.
# invoke with something like:
#
# formail -s procmail -m special_rcfile.rc < thrashedmail.mbx
#
# should be invoked as root (necessary for access to /var/log/maillog as well
# as procmail -d)
#
# you're a schmutz if you need to be running this script in the first place,
# so it'd make sense to log the hell out of what happens when you have to
# run it, in case you fsck up again.
LOGFILE=i_fscked_up.log
VERBOSE=on
# just as it appears in the received: headers on the fscked-up deliveries
MY_MAILHOST="your_mailhost_name"
:0
* ^Received:.*by $\MY_MAILHOST.* with E?SMTP id \/[a-z0-9]+
{
SMTP_ID=$MATCH
# Unfortunatley, formail doesn't process in argument order, and thus
# msgid can't be rewritten in a single invocation.
:0 Whc: msgid.lock
| formail -I"Message-Id: <$SMTP_ID>" | formail -D 8192 msgid.cache
:0 a:
duplicates.mbx
# Now, grep the maillog (or an archived copy of it)
# for the SMTPID as it pertains to local deliveries of the message.
RECIP_RAW=`grep "i5ALtpQS024861: to=.* mailer=local" /var/log/maillog \
| sed -e "s/^\(.*to=<\)\([^>]*\)\(>, .*\)$/\2/"`
:0
* ! RECIP_RAW ?? ^^^^
{
# note we're invoking sendmail directly here - if you use a different
# MTA, things are likely to be radically different (hell, this whole
# recovery approach might not work for you at all).
RECIP_INTER=`sendmail -bv $RECIP_RAW \
| sed -e "s/^\(.*deliverable: mailer local, user \)\(.*\)$/\2/" \
| sed -e :a -e '$!N' -e '$!ba' -e 's-\n-\ -g'`
}
# for testing purposes (I dunno, like the FIRST time you run this script
# before you're POSITIVE it'll work for your system), you could shuttle
# this to one mailbox file just so you can see what messages are
# ultimatley identified as deliverable to some user.
# alternately, prefix the command with echo, and add the 'i' flag
:0
* ! RECIPS_INTER ?? ^^^^
| procmail -d $RECIPS_INTER
}
# anything not matching a messageid, or which failed to result in recipients
# being extracted from the logfile, will end up here, as well as delivery
# failures by the procmail invocation above. Store in a mailbox for manual
# examination. Most likely cause for ending up here is messages which are
# older than the maillog reflects.
:0:
unhandled_redelivery.mbx
# END i_fscked_up.rc
FTR, a majordomo listprocessor frontend I wrote makes a backup of messages,
and in that, it saves the parameters which were passed to procmail. This
allows for simple recovery from list problems (all lists are archives to
the same backup file), by formail splitting the edited backup against a
simple recovery script. This alleviates the problems associated with
screwups, and doesn't require special access to logfiles and the like. It
is also terribly faster than the above script will end up being when run
against a large spool of messages...
NOW, nobody can say there's not a writeup of the proper way to recover from
an /etc/procmailrc filing fsckup.
---
Sean B. Straw / Professional Software Engineering
Procmail disclaimer: <http://www.professional.org/procmail/disclaimer.html>
Please DO NOT carbon me on list replies. I'll get my copy from the list.
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail
|
|