procmail
[Top] [All Lists]

Re: (Parsing dates and) persistent lock files that never go away

2004-07-23 18:20:01
* Dallman Ross <dman(_at_)nomotek(_dot_)com> [2004-07-23 17:05]:

Yes.  You do not need any pipes to extract the year.  Your 19/20
thing would add some complexity, but even so it could be done all
in procmail.  But first, I have to ask: why are you using the
string from the Date: headers, which in general is untrustworthy,
rather than the one from From_ (top line), which your server
(with or without procmail's help) generates and which is as
trustworthy as the clock on your server?  The point does seem
to be to archive the mail according to received-time, yes?
Even if you are backtesting old mail, the From_ header would
normally still be the old date.

That's a good point, and probably the best date to take from new
inbound mail.  But much of the old email that I'm refiltering has
corrupt From_ fields.  Some of it has no From_ field at all (because
it may have been delivered as a digest or my old MUA didn't bother to
save it).  Then I also have a stack of messages that have gone through
my defective scripts, and the From_ field has been stomped on with
LINEBUF overflows, and I deleted the original mailboxes before I
discovered this was happening.

In all of the above cases, Formail constructs a new From_ line with
the current date.  It would be nice if I could tell Formail to trust
the Date field, and construct a new From_ line using that date, but it
does not seem to be possible.

I was actually planning to complicate the 'year extractor' even more,
because sometimes the Date: field is empty, non-existent, or invalid,
in which case I would want the YEAR variable to then try to contain
the year located in the date that trails at the end of the From_
field.

Yes, so why not use that to start?  You have just demonstrated
part of the untrustworthiness of the Date: header.

It seems I need to parse both dates, because the From_ field is more
accurate on new inbound mail, and the Date: field is more accurate on
old mail being reprocessed.

If all those are rejected and you still want to use the Date: header's
asserted year, you can still easily do it in procmail only.
Here's a sample Date: header (from some spam I have lying around,
but heh):

   Date: Tue, 20 Jul 2004 11:47:42 +0000

Lots of ways to approach this, but the time should always be
there, so I'd use that since the colons are a nich anchor
or search object:

  :0
  * ^Date:.*\/(19|20)?[0=9][0-9][^a-z]+:
  * MATCH ?? ^^\/[^   ]+
  { YEAR = $MATCH }

Thanks for posting that.  I assume your [0=9] is really a [0-9]?
Also, is the compound statement delimiters "{}" really necessary if
it's just a single command?

By the way, if someone in a TZ not yours sends a message near
midnight at the end of the year, which year are you putting it in?

If it's an old message, I'll put it in whatever is in the DATE field.
If it's a new inbound message, I will use the year my server has.  It
doesn't matter much, because at the end of a year, I'll be reading
inboxes of both years.

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail