Re: Filtering already saved messages


On Tue, 19 May 1998, era eriksson wrote:

 > -Each message that is looked at by formail is compared to the msgid.cache
 > file.  If the messageid for that message is not present in that file, the
 > message id is recorded into the msgid.cache file and then allowed to pass
 > on to the rest of the recipes in .procmailrc.  If the messageid is in the
 > msgid.cache file formail then considers that message a duplicate message
 > and then does what with it??  Sends it to /dev/null or bounces it back to
 > the sender or what?  Can I choose what I want it to do with the message?
 > For example..  if it is a duplicate but I still want to see them, can I
 > tell formail to pass all duplicates to a mailbox called Dupes?  I don't
 > know what the syntax would be for things like that.  Chances are I'm just
 > going to want formail to get rid of the message, but its still nice to
 > know my options.

It can be modified but it's sort of non-trivial because the example
works with side effects (see the FAQ for an explanation). Several of
the tutorials tell you how to do it differently.

    :0h:msgid.lock
    * ? formail -D 8192 msgid.cache
    Dupes


Ok that worries me alittle.  :)  Basicly I think originally I want to do
this receipe with all messages found as dupes going to the Dupes file  
because I want to see which messages formail considers as duplicates.
One last thing I'm wondering is what are the "side effects" your
mentioning?  Like I said I'm anal about my mail.  No hurt my mail, no hurt
my mail!  :)
And also what happens if the message is not a duplicate?  Will it still be
sent to the Dupes file before getting past to procmail to be also filtered
to another mailbox?  Or will formail send it right past the "Dupes" line?

 > -I have literally thousands of messages..  is an 8k cache size enough?

It's just a circular chain, it fits in as many as there is room for.
With an average message-id length around 24 characters (I think that's
about reasonable), you'd have room for some 340 messages in the cache.


Right.  I understand what a cache like this is.  In first out last.  But
in order for a COMPLETE comparision of both mailboxes ALL information must
fit in the boundaries I set for the cache file.  I figure a meg is pretty
safe.  Worse that will happen is I look at the cache file, see its exact a
meg in size..  assume there was some overspill and just redo the process
after setting it to 2megs or something.

 > Perhaps I should make it 1024000 (1meg) in size just to be save?
 > If the cache gets to be around 100k or so though..  how slow of a
 > processing am I going to endure?  Especially since its going to be
 > creating and deleting a lock file for what looks like every message..

Haven't timed that, play around with a few hundred. It's a one-off job
anyway, right?


Ya your right.  This project has already lasted about 4 days.  So I can
wait.  :)

 > Ah ok cool.  If mush does the job I'm going for it.  How do you mean
 > though that its not particularly elegant?  Just that it was designed just
 > for the task?  As long as it does the job perfectly thats fine with me.

Not many people use mush anymore and it's not exactly intuitive the
first time around.


Ah, that sucks.  Its funny that PINE and Eudora can do all kinds of
things, but I can't change a flag back to unread.  :)

 > Speaking of which..  is there a program out there that can manipulate the
 > read/unread flag on emails in a mbox?

Pine and other abominations use the Status: header field for that,
which you can certainly modify to suit your taste.


Hmm.  I dind't see something like that.  If that was in Pine, why did you
recommend mush?


THanks much,

-Matt