procmail
[Top] [All Lists]

Re: remove duplicates on OLD mailboxes

2001-08-09 10:48:34
how do you tell formail to:

     take a BIG mailbox ( >25 MB)
     remove all duplicates
     put all messages sent during year YYYY into another new mailbox
     all this while:
             being tolerant because some messages are screwed
             (converted/copied from old MS mailboxes)
             formatting output properly to patch the problem above?

...................................

You don't - formail doesn't do what you want.  What you *do* is use
formail to slice an existing mailbox into pieces and then pass each
message in the mailbox to *procmail* to do the duplicate suppression
(using formail to maintain a cache of Message-IDs) and filtering on
date.  You'll use a command something like

formail -Y -s procmail -m pmrc

where pmrc is a procmail recipe file like .procmailrc that does the
various things you want.

I had always seen procmail using formail, not the other way around.

Uhhh...  the 2nd example in the formail man page ????

Also, note that that example indicates that my suggestion should
have included the -d flag in the toplevel formail invocation.  You
may also may not need the -Y flag that I use so my suggestion is
now

formail -ds procmail -m pmrc


Also, I was confused by what I found on this URL:

     http://www.linuxfaq.it/revisione/ldr5.html#index1616

which doesn't mention anything like you say.

If I understand the Italian correctly (a major leap of faith), the
reason that this referenced page suggests something different from
what I suggest is that the example script there *only* does the
duplicate suppression (it too is lacking the -d flag that I left
out).  My suggestion was to use formail and procmail in combination
to do what formail does not do alone.  Since I was suggesting that
you call procmail anyway, my solution was to have procmail call formail
to do the duplicate suppression during procmail processing (following
the 2nd example of an autoresponder in the procmailex(5) man page).
It is equally valid, and acutally more efficient, to have the
outer formail suppress duplicates before passing individual messages
to procmail for further filtering.

At this point, I assume that, running straight on the command line
 
     formail -Y -s procmail -m pmrc

the polished mailbox  would be that defined in pmrc, right? 

Right, except now it should read

formail [-Y] -d -s procmail -m pmrc [< mbox]

where you may not need the -Y flag and you will either pass the mbox
to be filtered in via a pipe or use the shell redirection.

If you prefer to have the outer formail suppress duplicates, this
changes to

formail [-Y] -D 200000 cache -d -s procmail -m pmrc [< mbox]

Last but not least, may I ask you to point me to recipes that

remove duplicates, *and* print out only messages in given year (where
year might be in 4 or 2 digits)?

From where do you want the year info to come?  The Date: header?  The
From_ header?

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>