ietf-asrg
[Top] [All Lists]

Re: [Asrg] 6. Proposals - Bayesian filtering and Yahoo (Was Re: C/R)

2003-12-29 15:21:12
Matthew Elvey <matthew(_at_)elvey(_dot_)com> wrote:
Data point: Yahoo is trying and failing.  To PAYING customers of 
MailPlus ( http://mailplus.mail.yahoo.com/ , it includes SBC DSL and 
dial-up customers) they claim you can "Train your personal spam filters 
to recognize what /you/ consider spam" using the SpamGuard feature.  

  Finland has a large mail redirection system set up, similar to
pobox.  I forget what it's called (ikki?), but it's fairly large.

  Last I heard, they're using SpamAssassin, which is (mostly) keeping
up with the load.  They split the mail into filtered & unfiltered
streams, with the filtered streams going to dedicated machines.  After
filtering, the messages are sent back into the main queue, and
forwarded to the final destination.

  CPU time is relatively cheap, so this works.  Sendmail can handle
large amounts of traffic, with careful design.

  But Bayesian filtering takes large amounts of disk space per use.
I've heard numbers from 1M to 10M, with much of the database touched
during active filtering.  For a system forwarding messages for 10's of
1000's of users, that's expensive.  The disk space isn't the problem,
it's the simultaneous access of 1000's of accounts to reading and
writing 10M databases.

  When coupled with many people using the forwarding service to
subscribe to mailing lists, it's nearly impossible to partition that
load in a way which doesn't increase it.

  Alan DeKok.

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg



<Prev in Thread] Current Thread [Next in Thread>