procmail
[Top] [All Lists]

Re: procmail/spassassin training session

2013-09-14 14:45:09
It appears your process feeds the (spam) mails one by one? I am not sure if that's the way Spamassassin likes as it needs statistics for the Bayesian algorithm to work.

SA prefers to learn at once from piles of large number of mails:

  sa-learn --spam --mbox SPAM_MAILS

where SPAM_MAILS is in a file in mbox format. I guess it may not be a good idea to mix good mails with spam mails. It learns good mails as in:

  sa-learn --ham --mbox SPAM_MAILS

I update my SA db whenever I have over 1000 spam mails (I do exam to make sure it does not contain good mails).

--
Zhiliang


On Sat, 14 Sep 2013, Harry Putnam wrote:

To: procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)de
From: Harry Putnam <reader(_at_)newsguy(_dot_)com>
Subject: procmail/spassassin training session
Date: Sat, 14 Sep 2013 11:04:20 -0400

Setup:
procmail v 3.22
Spamassassin v 3.3.2
Debian linux (testing)

I'm working in a procmail sandbox trying to teach spamassassin to do a
better job of recognizing spam. $sandbox/.procmailrc calls spamc like so:

,----
| :0fw
| | /usr/bin/spamc
| =

| :0:
| * ^X-Spam-Status: Yes   =

| spama_spam_.in
|
| :0 =

| post_sa.in
`----

I'm running procmail like this:

cat mixedmail_3000m| formail -e -s procmail -m ${sandbox}/.procmailrc

I have piles of spam/ham mix, some 60,000 messages, heavily leaning to
spam, that I've accumulated. =


I want a little coaching on running a spamassassin learning session.

First, I'd like to know if it matters that I have autolearn disabled in
SA config file for the duration?  I want to manually feed SA the spam
and ham so assuming I'd want autolearn off.

Another thing I wonder:

I planned to feed 3 thousand messages to SA then pick the falsely
filed ham out by hand and feed it again in an =

`sa-learn --mbox --spam falseham' command.

Now if I run the same several thousand messages thru SA as incoming
mail for the 2nd time, will SA do a better job of separating the ham
and spam? Or do I need to use different unprocessed mail for the
second run?

How many messages would make an effective learning session?
____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)de
http://mailman.rwth-aachen.de/mailman/listinfo/procmail

<Prev in Thread] Current Thread [Next in Thread>