Re: archive by months and years

1999-11-04 15:05:20
Your script, as written, will dutifully extract the date header, parse
it, and spool to the wrong directory, because you relied on bad data.  

In virtually all cases, it's easier, safer, and more accurate to store
messages based on the time they arrive, not when they say they were

I agree.  On lists that I archive, I pre-process a lot of the messages
to do various things (remove certain attachment types, check for known
viruses, etc.) with procmail.  I then split the messages up by month
with a procmail recipe as well.

One of the pre-processing checks that one could do would rename the original
Date: header to 'Old-Date:', then create a new one based on the contents
of the envelope From header.  Here's a procmail recipe that does just 

   TIMEZONE = "-0600 (CST)"

   # Note: the brackets below contain a space and a tab in each
   * ^From[     ]+[^    ]+[     ]+\/[^  ]+.*

   * FROMDATE ?? ^()\/[A-Za-z]+

   * FROMDATE ?? ^[A-Za-z]+\<+\/[A-Za-z]+
   { MONTH = $MATCH }

   * FROMDATE ?? ^[A-Za-z]+\<+[A-Za-z]+\>+\/[0-9]+
   { DAY = $MATCH }

   * FROMDATE ?? ^[A-Za-z]+\<+[A-Za-z]+\>+[0-9]+\<+\/[0-9:]+
   { TIME = $MATCH }

   * FROMDATE ?? ^[A-Za-z]+\<+[A-Za-z]+\>+[0-9]+\<+[0-9:]+\>+\/[0-9]+$
      YEAR = $MATCH

      | formail -i "Date: $NEWDATE"

It's probably easier and more extensible to do the same with perl, though, 
since you have to make a system call to formail in the procmail recipes above

   | perl -e 'while (<STDIN>) { $newdate = "$1, $3 $2 $5 $4 -0600 (CST)" if 
/^From\s+\S+\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\d+)/; if (/^Date:\s+/ && 
defined($newdate)) { print "Date: $newdate\nOld-$_"; } else { print $_; } }' 

Now there should be no confusion about dates, nor should it matter what you
split on...

I'm not sure I would recommend using these, but if you're concerned about
people being confused over dates and want to stick with the only known
quantity, this should fit the bill.


<Prev in Thread] Current Thread [Next in Thread>