procmail
[Top] [All Lists]

Re: Regenerate From_ with Date in formail

2003-02-27 20:33:22
On 27 Feb, Steve Huntsberry wrote:
| I have a mbox where the From_ field is incorrect but the other
| fields are correct. I would like to regenerate the From_ field,
| using the timestamp in the Date: field. Apparently formail can
| do this, but under BUGS it says that I may not get what I want.
| 
| BUGS
|        When  formail  has  to  generate a leading `From ' line it
|        normally will contain the current  date.   If  formail  is
|        given the option `-a Date:', it will use the date from the
|        `Date:' field in the header (if present).  However,  since
|        formail  copies  it  verbatim, the format will differ from
|        that expected by most mail readers.
| 
|        If formail is instructed to delete or rename  the  leading
|        `From  '  line, it will not automatically regenerate it as
|        usual.  To force formail to regenerate it  in  this  case,
|        include -a 'From '.
| 
| I would like to ask formail to delete the current From_ field
| and regenerate it using the date in the Date: field, but using
| the format of the From_ field, which differs from Date: field.
| 
| Example:
| 
| >From shuntsbe(_at_)mail-pop Tue Feb 18 18:16:28 2003 -0800 [WRONG]
| Date: Fri, 7 Feb 2003 16:45:00 -0800 [RIGHT]
| ...

I'm confused by the example given, and am relying on the description
just above it which seems clear enough to me.  However, I'd suggest
that the Date: header is not the best place to get the date from.  It
can be off by months or more from the actual delivery date which should
be contained in From_.  That date should be available in the last (top)
Received: header. That header is generated by the last machine  that
touches the message, so it is both reliable and predictable. If it were
me, I'd do something like what follows. Of course, you may have some
specific reason for using the Date: header, or the Received: headers are
foobar, but I'm guessing it was just the most obvious choice. If so,
something like what follows should restore the envelope accurately, and
probably more easily than trying to account for all possible
bastardizations of the Date: header.

This depends on the format of *your* envelope From_ and Received:
headers.  Mine look like:

  From throwaway-iOF6aPey5sh6(_at_)tradersdata(_dot_)com  Fri Oct 18 16:58:43 
2002
  Return-Path: <throwaway-iOF6aPey5sh6(_at_)tradersdata(_dot_)com>
  Received: [...] Fri, 18 Oct 2002 16:58:43 -0400

You might need some adjustments to conditions below for your system.

If I munge that From_ and run it through the following rcfile, it is
"regenerated".  I've only tested it on ONE message, so take it with a
grain of salt.  For testing you might want to change the good
delivery to /dev/null and add something like the following at the end:

LOG = "$RETURNPATH $xDATE
$ENVELOPE
"

Then run your mbox file through it something like:

$ formail -ds procmail ./thisrc <mbox.file

The messages that do not have From_ regenerated end up in "nogood" so
you can try and pinpoint what's wrong; and you see on the screen how the
Received: date is rewritten for From_.  If there's too many messages for
that to be useful, then add a LOG = somelogfile assignment and view the
results in somelogfile.  If there's still too many for that, it should
be trivial to write something to post process somelogfile to make sure
the rewrite is doing what you expect before you go live.

$ cat thisrc
WEEKDAYS = '(S(un|at)|Mon|T(ue|hu)|Wed|Fri)'
MONTHS = '(J(an|u[ln])|Feb|Ma[ry]|A(pr|ug)|Sep|Oct|Nov|Dec)'
TIMESTAMP = '([01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]'
# dates between 1980-2003 ok. Adjust following as needed.
RCVD_STAMP = "$WEEKDAYS, [0-9]+ $MONTHS (1[89]|200[0-3]) $TIMESTAMP"

# <spac><tab> in [character class] below 
:0
:0
* ^Return-Path:[        ]*<\/[^>]*>
* MATCH ?? ^^\/[^>]*
{ RETURNPATH = "${MATCH:-MAILER-DAEMON}" }

:0
* $ ^Received:.*\/$RCVD_STAMP
{ xDATE = "$MATCH" }

# short-circuit w/o data
:0:
* 1^0 RETURNPATH ?? ^^^^
* 1^0 xDATE ?? ^^^^
nogood

:0
* $ xDATE ?? ^^\/$WEEKDAYS
{ ENVELOPE = "$RETURNPATH  $MATCH" }
:0
* $ xDATE ?? ^^$WEEKDAYS, [0-9]+ \/$MONTHS
{ ENVELOPE = "$ENVELOPE $MATCH" }
:0
* $ xDATE ?? ^^$WEEKDAYS, \/[0-9]+
{ ENVELOPE = "$ENVELOPE $MATCH" }
:0
* $ xDATE ?? ()\/$TIMESTAMP
{ ENVELOPE = "$ENVELOPE $MATCH" }
# dates between 1980-2003 ok. Adjust as needed.
:0
* $ xDATE ?? $MONTHS \/(19[89][0-9]|200[0-3])
{ ENVELOPE = "$ENVELOPE $MATCH" }

:0 fhw
* $ ENVELOPE ?? $WEEKDAYS $MONTHS [0-9]+ $TIMESTAMP (1[89]|200[0-3])^^
| sed "s,^\(From \).*,\1$ENVELOPE,"
:0 a:
good

:0 e:
nogood


-- 
Email address in From: header is valid  * but only for a couple of days *
This is my reluctant response to spammers' unrelenting address harvesting



_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>