[Top] [All Lists]

Duplicated messages not detected (Was: how to sort old mailboxes)

2001-08-09 15:38:17

First of all thank you very much for all your assistance!!

thanks to your help, I've already found a way to considerably reduce
the volume of email I have to sort, and have found another problem,
see below.

I use the following script:

#! /bin/sh
for it in `find $1 -type f`
    echo $it
    cat $it | formail -D 4000000 /tmp/cache.tmp -Y -s procmail -m

where pmrc_step_1 is the procmail recipe file enclosed to this
message. Of course any comment on the formail line above is more than

The recipe file is meant to:

   delete all old mailing list messages, since they are available
   online anyway. 

   put all mail with a Date:.*YYYY header into a YYYY folder for a
   second round of filtering

   put all mail with no four digit year ( 00/01/99/etc.. in another
   folder for later check

I have saved on disk your replies and will study them to see if I
could do better. I can confirm that at least some of the mailboxes are
messed up, so it's better to be flexible.


This is somewhat specular to the "essential headers" thread seen on
this list last july.

I have realized that many messages don't have all headers (God know
why..), and that, after the first round of sorting, there are many,
many messages which *are* equal (as in "same body and attachments,
same From, To and Date Headers") but the headers are either in
different order, or almost, but not, equal. EX: "Status: RO" vs
"Status: O". In other words, they *are* equal for all practical human
purposes, but how can I have formail/procmail to understand this, and
save only one copy?


                        Marco Fioretti
Don't you wish you had more energy... or less ambition?
procmail mailing list