procmail
[Top] [All Lists]

Re: Converting from MH to mbox?

2004-02-11 02:13:16
On 09 Feb 2004, at 15:39, Don Hammond wrote:
On  9 Feb, David W. Tamkin wrote:
| Kreemy followed up,
|
| > Current date and time is not acceptable. Some of this email goes back
| > 10 years.
|
| ... and I gather you ened the original delivery timestamp in From_.
| Maybe you didn't; I couldn't have known either way until you said.

well having the current date in the From wasn't going to work, it had to at least be reasonably close to the messages real date. The only date that I can trust for certain is the date on MY server.

Interestingly, his Kremliness followed up with another implementation,
so maybe he could use his own code.

Yep, as it worked out that is what I had to do. I was really hoping for a faster solution as it took several hours to pump 250,000 mails through that procmail recipe just to get the correct date format in the FROM_, although since I was there I filtered all the mail too.

The packf script from nmh, much to my disgust, simply used the current date to generate the FROM_ header, so that was not a solution. Although it *was* worthwhile because it did yield mbox files, albeit with bad FROM headers so they all appeared as a single message until I procmailed them. However, the FROM_ headers where in at least the right format, so the mbox files were readable by mutt and mail.app.

Looks like Gary's shell script would have done the job as well, and a lot faster I am sure, but I was already done by the time I received that message in my spool.

I didn't look at the differences.  At the time there was a debate
between using Received: and Date: headers. I know mine used the topmost
Received: header.  I don't know about LuKreme's.

Mine uses a specific Received: header, which is defined in the script. In my case it is

WHICHRECVD = 'by (mail.)?(southgaylord.com|covisp.net)'

I've found Date: headers to be wildly unreliable (I have more than a few messsages with dates of 2038, for example, along with many 1904's and 1969's).

On 09 Feb 2004, at 17:56, Dallman Ross wrote:
One thing that occurs to me is that if the original folders are actually intact from the original delivery, then their filestampis a good source for ascertaining date with a correct format. GNU date has an option to apply the date from a specific file stamp. I actually use this when I'm touching (timestamping) my whitelist file-hash.

Not a bad idea, but not reliable in this case. The files where a manual cp backup of an existing mailspool. All the files had (near) identical timestamps.

what I did:

packf to pack the directories into a bigbox
formail -s procmail -m daterc < bigbox

daterc contained:

# Fallback MYDATE in reverse format
MYDATE=`date '+%Y-%m'`
#Used whilst testing
#MLDIR=$HOME/temp
MLDIR=$HOME/Mail
NL="
"

:0 h
CLEANFROM=|formail -IReply-To: -rtzxTo:

# Slightly altered version of my original DATEMUNGE script
# This version does not depend on the Return-path since far
# too many of the messages in question did not have a
# Return-path header.

WEEKDAYS = '(S(un|at)|Mon|T(ue|hu)|Wed|Fri)'
MONTHS = '(J(an|u[ln])|Feb|Ma[ry]|A(pr|ug)|Sep|Oct|Nov|Dec)'
WHICHRECVD = 'by (mail.)?(southgaylord.com|covisp.net)'
# Years appears in several different forms in the original post.
YEARS = '(199[0-9]|20[0-3][0-9])'
TIMESTAMP = '([01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9]'
RCVD_STAMP = "$WEEKDAYS, [0-9]+ $MONTHS $YEARS $TIMESTAMP"

:0
* $ ^Received:.*$WHICHRECVD.*\/$RCVD_STAMP
{ xDATE = "$MATCH" }

:0fw
* xDATE ?? ^^^^
| formail -i"X-Date-Extract: FAILED"

:0 E
{
   :0
   * $ xDATE ?? ^^\/$WEEKDAYS
   # NB, there must be TWO spaces before the $MATCH
   # for a proper FROM_      vv
   { ENVELOPE = "<$CLEANFROM>  $MATCH" }
   :0
   * $ xDATE ?? ^^$WEEKDAYS, [0-9]+ \/$MONTHS
   {
      ENVELOPE = "$ENVELOPE $MATCH"
      MYMONTH=$MATCH
   }
   :0
   * $ xDATE ?? ^^$WEEKDAYS, \/[0-9]+
   { ENVELOPE = "$ENVELOPE $MATCH" }
   :0
   * $ xDATE ?? ()\/$TIMESTAMP
   { ENVELOPE = "$ENVELOPE $MATCH" }

   :0
   * $ xDATE ?? $MONTHS \/$YEARS
   {
      ENVELOPE = "$ENVELOPE $MATCH"
      MYYEAR=$MATCH
   }

   # Make sure the $ENVELOPE matches the desired format
   # If it does, rewrite the From_
   :0 fhw
   * $ ENVELOPE ?? $WEEKDAYS $MONTHS [0-9]+ $TIMESTAMP $YEARS^^
   | sed "s,^\(From \).*,\1$ENVELOPE,"
}

      # An ugly ugly kludge
      TEMPMON = $MYMONTH
      :0
      * 1^0 $TEMPMON ?? JAN
      * 2^0 $TEMPMON ?? FEB
      * 3^0 $TEMPMON ?? MAR
      * 4^0 $TEMPMON ?? APR
      * 5^0 $TEMPMON ?? MAY
      * 6^0 $TEMPMON ?? JUN
      * 7^0 $TEMPMON ?? JUL
      * 8^0 $TEMPMON ?? AUG
      * 9^0 $TEMPMON ?? SEP
      * 10^0 $TEMPMON ?? OCT
      * 11^0 $TEMPMON ?? NOV
      * 12^0 $TEMPMON ?? DEC
      {
         MYMONTH = $=
         PADM = "0"$MYMONTH

         # I'm kinda proud of this bit, though I suspect
         # there is a MUCH more efficient way to do it.
         # gives  a 0-padded month 01-12
         :0
         * ! PADM ?? \/...
         {
            :0
            * PADM ?? ^^\/..
            { MYMONTH = $MATCH }
         }

      }

MYDATE=$MYMONTH-$MYYEAR

The script then continued with Sean's 'universal' list processor and the MYDATE value was used to generate a folder name in the form of $MYDATE.$LISTNAME (eg, 2004-02.procmail). For the record, this recipe had a 0 fallback to the MYDATE format of 2004-02.

--
Why live in the world when you can live in your head?


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail