Re: procmail/spassassin training session

Zhiliang <hu(_at_)animalgenome(_dot_)org> writes:

It appears your process feeds the (spam) mails one by one?  I am not
sure if that's the way Spamassassin likes as it needs statistics for
the Bayesian algorithm to work.

SA prefers to learn at once from piles of large number of mails:

  sa-learn --spam --mbox SPAM_MAILS

where SPAM_MAILS is in a file in mbox format.  I guess it may not be a
good idea to mix good mails with spam mails.  It learns good mails as
in:

  sa-learn --ham --mbox SPAM_MAILS

I update my SA db whenever I have over 1000 spam mails (I do exam to
make sure it does not contain good mails).


My post was confusing... sorry.  I am doing what you suggest above.

The difference is first I run a pile of mail (mixed ham/spam) thru,
just like it is normal mail.

Then from that result I pick out spam that was classified as ham and
ham classsified as spam.

Now I have two piles: 1 all spam and 1 all ham

Then I run the learning sessions from those mbox files.

My question were:

1) Does it matter that I have autolearn turned off in spamassassin
conf filt 'local.cf' while doing my sandbox work

2) I've dirived the mbox files of pure ham and pure spam by running
mixed mail so SA has already seen this mail.

Now that is sorted into known spam/ham can I use that same mail for
the learning sessions.
 

____________________________________________________________
procmail mailing list   Procmail homepage: http://www.procmail.org/
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)de
http://mailman.rwth-aachen.de/mailman/listinfo/procmail