procmail
[Top] [All Lists]

generic questions

2003-12-08 14:22:47
I don't need or want anyone to write me a recipe for this (I'd like to 
figure out how myself).  I just need to know what tools (if any) in the 
procmail family are appropriate.  However, if this has all been done already,
a pointer to it would be appreciated.


I am trying to deal with bayesian learning for spamassassin by building a
ham and spam corpus from my existing emails.

The problems.

1)My "ham" emails are in hundreds of separate mbox files.  Can they simply be 
"catted" together, or do I need to run procmail with a single recipe to file 
them in a new location?

2)Can procmail help me with weeding out duplicates in this file?  I use it for
removing duplicates in my normal rc file, but my historical mai have all been
through this procmail recipe once before.  I normally use this:

  :0 Wh: msgid.lock
  | formail -D 8192 msgid.cache

Will running my existing mbox folders through this again result in either
 a)confusion for my regular mail?
   or
 b)skipping any messages seen previously?

Should I just change the msgid.cache to say msgid2.cache to avoid this issue?

3)Is there any mechanism in procmail for helping me keep only the most recent
(n*1000) emails in this file?


For going forward, I assume I'll find in the docs a method for copying any
given email to the correct place and duplicating it into my correct corpus
as well, but the issue of keeping the corpi free from duplicates and retaining
only the most recent (n*1000) messages still remains.

It is always possible that procmail is not the appropriate tool for this
last piece.

thanks,
-chuck

-- 

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>