[spf-discuss] SPF enables fully automatic spam filter


Thought I would let everyone know about a sweet system I have running in
production on a client.  It provides highly effective content filtering
with virtually zero false positives, and without any user input.  It is
based on pymilter, SPF, the open-source DSpam content filter,
auto-whitelist, and a honeypot.

1) One or more local mailboxes are configured as a "honeypot".  Any email 
sent to any of these addresses is used to train DSpam for spam, then 
discarded.  The honeypot addresses are listed on webpages, and posted to a 
few news-groups.  :-)

2) Any emails sent to by local users are added to a whitelist.  The 
whitelist may also be edited manually.

3) Incoming mail that
  a) matches the whitelist
  b) gets a SPF pass (including with heuristics like best_guess) 
is never rejected due to content.  It is used to train DSpam as ham
instead.  (Shifting to train on error mode when training database reaches
a certain size.)

4) Incoming mail that is not whitelisted or fails to get an SPF pass (and
is not otherwise rejected by e.g. SPF fail) gets quarantined if it looks
like spam.  A DSN is sent to the alleged sender first.  If the DSN is not
accepted, the message still trains DSpam as spam, but is discarded instead
of quarantined.

5) Users can still forward selected spams to a magic address (e.g.  
spam(_at_)example(_dot_)com) and use a web interface to peruse and possibly 
release
quarantined spam, but this is not required for the filter to be effective.

For a single shared dictionary (Dspam token stats), the only configuration 
required is to select some honey pot addresses!  The system neatly trains 
itself based on key characteristics of spam: 
  o unsolicited (spam sent to honeypot address provides positive training, 
    solicited provides negative training via whitelist)
  o bulk (has to be a statistically significant quantity to affect Dspam)

The obvious weak point of this system is zombies.  Should any of the
whitelisted senders contract a zombie, it would be possible for the zombie
to crank out spam - and poison the Dspam dictionary in the process.  So
far, this is not a problem in practice because most zombies forge the
sender (and hence don't pass SPF).  I'm not sure what the next step is
when zombie writers start using senders filched from the local machine
that get SPF pass and are likely whitelisted.

Another weak point is that an initial contact that looks like SPAM will
likely be quarantined.  In this case, at least the sender will know about 
the problem via the DSN and can try a simpler message (DSN suggest plain 
text instead of HTML) or a phone call.

Nits: senders that use local part encodings that we don't know about won't 
match the whitelist.  (We know about SRS and SES.)  Whitelisted emails 
expire due to no activity (currently 30 days).

Stats: daily average over 2.5 days for the one installed 
site:

  6477  SMTP connections
  2453  aborted (e.g. CBV to our site)
   843  Rejected due to forgery (SPF fail and other checks)
   849  Quarantined as spam
   189  TEMPFAIL (mostly DNS timeout during SPF lookup - likely spam)
  2143  delivered

This customer conducts business via email, so delivered count is high 
despite being virtually spam free.

Comments?

-- 
              Stuart D. Gathman <stuart(_at_)bmsi(_dot_)com>
    Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flamis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.

-------
Sender Policy Framework: http://spf.pobox.com/
Archives at http://archives.listbox.com/spf-discuss/current/
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to 
http://v2.listbox.com/member/?listname=spf-discuss(_at_)v2(_dot_)listbox(_dot_)com