Thought I would let everyone know about a sweet system I have running in
production on a client. It provides highly effective content filtering
with virtually zero false positives, and without any user input. It is
based on pymilter, SPF, the open-source DSpam content filter,
auto-whitelist, and a honeypot.
1) One or more local mailboxes are configured as a "honeypot". Any email
sent to any of these addresses is used to train DSpam for spam, then
discarded. The honeypot addresses are listed on webpages, and posted to a
few news-groups. :-)
2) Any emails sent to by local users are added to a whitelist. The
whitelist may also be edited manually.
3) Incoming mail that
a) matches the whitelist
b) gets a SPF pass (including with heuristics like best_guess)
is never rejected due to content. It is used to train DSpam as ham
instead. (Shifting to train on error mode when training database reaches
a certain size.)
4) Incoming mail that is not whitelisted or fails to get an SPF pass (and
is not otherwise rejected by e.g. SPF fail) gets quarantined if it looks
like spam. A DSN is sent to the alleged sender first. If the DSN is not
accepted, the message still trains DSpam as spam, but is discarded instead
of quarantined.
5) Users can still forward selected spams to a magic address (e.g.
spam(_at_)example(_dot_)com) and use a web interface to peruse and possibly
release
quarantined spam, but this is not required for the filter to be effective.
For a single shared dictionary (Dspam token stats), the only configuration
required is to select some honey pot addresses! The system neatly trains
itself based on key characteristics of spam:
o unsolicited (spam sent to honeypot address provides positive training,
solicited provides negative training via whitelist)
o bulk (has to be a statistically significant quantity to affect Dspam)
The obvious weak point of this system is zombies. Should any of the
whitelisted senders contract a zombie, it would be possible for the zombie
to crank out spam - and poison the Dspam dictionary in the process. So
far, this is not a problem in practice because most zombies forge the
sender (and hence don't pass SPF). I'm not sure what the next step is
when zombie writers start using senders filched from the local machine
that get SPF pass and are likely whitelisted.
Another weak point is that an initial contact that looks like SPAM will
likely be quarantined. In this case, at least the sender will know about
the problem via the DSN and can try a simpler message (DSN suggest plain
text instead of HTML) or a phone call.
Nits: senders that use local part encodings that we don't know about won't
match the whitelist. (We know about SRS and SES.) Whitelisted emails
expire due to no activity (currently 30 days).
Stats: daily average over 2.5 days for the one installed
site:
6477 SMTP connections
2453 aborted (e.g. CBV to our site)
843 Rejected due to forgery (SPF fail and other checks)
849 Quarantined as spam
189 TEMPFAIL (mostly DNS timeout during SPF lookup - likely spam)
2143 delivered
This customer conducts business via email, so delivered count is high
despite being virtually spam free.
Comments?
--
Stuart D. Gathman <stuart(_at_)bmsi(_dot_)com>
Business Management Systems Inc. Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.
-------
Sender Policy Framework: http://spf.pobox.com/
Archives at http://archives.listbox.com/spf-discuss/current/
To unsubscribe, change your address, or temporarily deactivate your
subscription,
please go to
http://v2.listbox.com/member/?listname=spf-discuss(_at_)v2(_dot_)listbox(_dot_)com