procmail
[Top] [All Lists]

Re: Parsing dates

2004-07-25 03:50:17
On Sat, 24 Jul 2004 21:06:57 -0600, Justin Gombos 
<mindfuq(_at_)zianet(_dot_)com> wrote:
* Google Kreme <gkreme(_at_)gmail(_dot_)com> [2004-07-24 19:37]:

I have the following date_fix.inc file (Which was a collaborative
effort from several people here, including some minor fixes/features
by me.  I thin you can find the threads by searching for
"WHICHRECVD" I think the original 'bones' of the script were
generated by Don Hammond

The script looks pretty elaborate.  

It is elaborate, but it does exactly what is needed: get the date from
a specific received header (my server whose time/date I can trust) and
reconstruct a valid FROM header based on that.

It kinda bums me out that I wasted
time writing my own.  I'm not sure whether to use it, because someone
pointed out that GNU date can be used for date extraction.

GNU date is not necessarily available though. 

I initially liked the idea of doing the date extraction in procmail
(as opposed to external apps), but GNU date looks real appealing for a
couple reasons:

First of all, the UNIX philosphy of many very small tools devoted to
very specific tasks, and working together.  It's a great philosophy,
and eliminates the possibility of repeating someones effort - and
making mistakes along the way.  Someone worked hard to perfect GNU
date in date capture and parsing.

Also, interpretted instructions are slower than compiled instructions,
so efficiency-wise, GNU date is probably quicker, despite the overhead
of launching a child process.

Not necessarily.  Spawning a child is not cheap.

Does the procmail script you posted do anything that GNU date does
not?

Thye dn't at all do the same thing.  One simply gets the year from the
Date: header and the other scans the received headers for the right
timestamp and rebuilds a corrected From_ header based on it.

In all of the above cases, Formail constructs a new From_ line
with the current date.  It would be nice if I could tell Formail
to trust the Date field,

No, the Date: filed is wholly untrustable.

In my case, I have inbound mail that was composed in the '90's,
(because I'm filtering old mail) and the From_ field shows July 2004,
which was added by Formail, and it's a useless date to me.  There are
no received headers on much of the mail, so my only source for a
realistic date is the Date: field.

Well, you can adapt the script I posted to scan the Date header then. 
It will be hard because you can't predict the Date: header format with
certitude. Try doing a grep -e "^Date:" on your mail and just do a
quick visual scan to see how the formats stack up.

Here's a sample from my own mailboxes of some differing formats:

List Replies:Date: Tue, 02 Mar 2004 11:24:15 +1300
List Replies:Date: Wed, 03 Mar 2004 19:16:00 -0600
List Replies:Date: Wed, 3 Mar 2004 20:20:15 -0500 (EST)
List Replies:Date: Fri, 19 Mar 2004 05:50 -0800
List Replies:Date: Thu,  1 Apr 2004 23:11:29 +1000  
    (NB: there's a pad space before the 1)
List Replies:Date: 6 Apr 2004 10:32:04 -0000
List Replies:Date: 22 Mar 2004 10:08:02 UT
List Replies:Date: Wed, 7 Apr 2004 09:18:03 -0400 (GMT-04:00)
List Replies:Date:         Thu, 22 Jul 2004 16:13:27 +0400
   (NB: Yes, all those spaces)

are just a  few of the formats I ran into.  I don't know how well GNU
date will handle all of those formats (I suspect it might just be able
to), but if you can do something like

date -d "`formail -X Date:` +%a %b %d %H:%M:%S %Y"

and get a result then that's great.  I don't have access to GNU date
myself, so the point is a bit moot for me. (dunno how it would get
along with freeBSD anyway).

-- 
gkreme at gmail or kreme at kreme or syth at mac

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail