procmail
[Top] [All Lists]

Duplicated messages not detected (Was: how to sort old mailboxes)

2001-08-09 15:38:17
Hello,

First of all thank you very much for all your assistance!!

thanks to your help, I've already found a way to considerably reduce
the volume of email I have to sort, and have found another problem,
see below.

I use the following script:

#! /bin/sh
/tmp/cache.tmp
for it in `find $1 -type f`
    do
    echo $it
    cat $it | formail -D 4000000 /tmp/cache.tmp -Y -s procmail -m
/home/marco/MAIL_SORT_20010809/pmrc_step_1
    done

where pmrc_step_1 is the procmail recipe file enclosed to this
message. Of course any comment on the formail line above is more than
welcome.

The recipe file is meant to:

   delete all old mailing list messages, since they are available
   online anyway. 

   put all mail with a Date:.*YYYY header into a YYYY folder for a
   second round of filtering

   put all mail with no four digit year ( 00/01/99/etc.. in another
   folder for later check

I have saved on disk your replies and will study them to see if I
could do better. I can confirm that at least some of the mailboxes are
messed up, so it's better to be flexible.

NOW THE SECOND PROBLEM:

This is somewhat specular to the "essential headers" thread seen on
this list last july.

I have realized that many messages don't have all headers (God know
why..), and that, after the first round of sorting, there are many,
many messages which *are* equal (as in "same body and attachments,
same From, To and Date Headers") but the headers are either in
different order, or almost, but not, equal. EX: "Status: RO" vs
"Status: O". In other words, they *are* equal for all practical human
purposes, but how can I have formail/procmail to understand this, and
save only one copy?

                Ciao,

                        Marco Fioretti
-- 
Don't you wish you had more energy... or less ambition?
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail