procmail
[Top] [All Lists]

Re: Regenerate From_ with Date in formail

2003-02-28 13:35:50
On 27 Feb, LuKreme wrote:
| On Thursday, Feb 27, 2003, at 20:11 Canada/Mountain, Don Hammond wrote:
| > I'm confused by the example given, and am relying on the description
| > just above it which seems clear enough to me.  However, I'd suggest
| > that the Date: header is not the best place to get the date from.  It
| > can be off by months or more from the actual delivery date which should
| > be contained in From_.  That date should be available in the last (top)
| > Received: header. That header is generated by the last machine  that
| > touches the message, so it is both reliable and predictable.
| 
| While it is possibly predictable, it is certainly not necessarily 
| reliable.  For example:
| 
|  From procmail-bounces(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE  Thu Feb 27 
20:49:26 2003
| Return-Path: <procmail-bounces(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE>
| X-Original-To: kremels(_at_)localhost
| Delivered-To: kremels(_at_)localhost(_dot_)syth(_dot_)serveftp(_dot_)net
| Received: from southgaylord.com (localhost [127.0.0.1])
|          by syth.serveftp.net (Postfix) with ESMTP id 6A5D4A1CDE9
|          for <kremels(_at_)localhost>; Thu, 27 Feb 2003 20:49:26 -0700 (MST)
| Received: from [134.130.3.130] (HELO ms-dienst.rz.rwth-aachen.de)
|    by southgaylord.com (Stalker SMTP Server 1.8b9d14)
|    with ESMTP id S.0000567595 for <kremels(_at_)kreme(_dot_)com>; Thu, 27 Feb 
2003 
| 20:18:31 -0700
| 
| Since the top line is the receive header of my home machine, and since 
| that machine gets my mail via fetchmail, and since my fetchmail 
| processes was offline for a while, the From_ header shows the message 
| as arriving at 0349 GMT and the message REALLY arrived at 0318 GMT.  
| Depending on the stability of my connection, the last received header 
| can be as much as 30 hours off from the correct time.
| 
| I would, occasionally, find it useful if I could rewrite the From_ 
| header to reflect, for instance, the date in the header
| 
| Received:.*by southgaylord.com
| 
| since my mailer sort the messages based on the time stamp in the From_ 
| header.
| 
| Basically, the last mail sever in the Received chain may NOT be the 
| most reliable date stamp.  Even in the best of circumstances the From_ 
| date stamp will reflect a delay of 4-6 minutes due to fetchmail's 
| processing.

By reliable I meant it's not subject to forgery/munging by the sender.
Maybe integrity would've been a better choice of words.  For your
purposes you may not *like* the way it's done, but the integrity of the
data is more reliable than any other header. Only you can mess with it.

The rest of this boils down to semantics. When does a message "arrive"?
You want to define arrived as when it gets to your ISP.  I consider that
one of the hops, where a Received: header is rightfully inserted, and
the final arrival to be when it's downloaded to your machine and the
last Received: header is inserted.  I run my own mail server and (even
before I did) have never used fetchmail so this isn't a distinction I've
ever worried about. I'll grant there is some logic for your view, though
I think it breaks down (at least for implementaion) when considering how
to handle this differentiation between different kinds of hops and
Received: headers.  Even if the logic breaks down, that doesn't change
the fact that you have a legitimate personal desire for different
behavior.

Looking at it your way, I could guess that the difference between the
time a message arrives at your ISP and the time fetchmail downloads it
would, on average, be greater than the difference between the time it
was sent and it arrives at your ISP.  In other words fetcmail's polling
time interval is likely to be greater than the time from sender to ISP.
On the other hand, for messages where the converse is true, the
discrepancy in the dates can be decades.  On average it might be
more reliable in your sense to use Date:, but the sample range will be
much wider so that incorrect dates will be off by a significantly
greater amount.  So which is better?  Is it having all the timestamps
off by a little, but never a lot?  Or is it having most of the
timestamps off by a little less, and sometimes a lot?  Obviously
there is more than one possible "right" answer depending on the user's
needs.

I'd guess my view is conventional as both your's and my headers show.
The date/time stamp in From_  matches the topmost Received: header.
That's expected, even if not personally desired. Whichever, or neither
or both, is "right" is irrelevant when you consider you can have it
either way.  I stand by my answer to the original question which was to
*regenerate* the envelope From_, which I take to mean its original
condition; but it should be mostly trivial if someone wants to adjust
the same concept to *modify* the envelope to something else that suits
their needs.  I'd argue that modifying something like what I suggested
to use the next to last Received: header would be more reliable by your
definition. Even considering the potential added complexity of
processing multiple Received: headers, that is arguably more robust than
trying to deal with the myriad of date formats in the Date: header.

-- 
Email address in From: header is valid  * but only for a couple of days *
This is my reluctant response to spammers' unrelenting address harvesting



_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>