SPF and reputation systems

On Fri, Nov 21, 2003 at 10:52:24AM -0800, Justin Mason wrote:
| 
| Hi Meng -- could you go into detail on how the reputation system
| will work?
| 
| I'm concerned, because Razor *doesn't work very well*. ;)  There's a large
| number of FPs caused by bad nominations in their dbs, it is often down
| [*], and it's slow.
| 

When a bank considers a loan, it asks a credit bureau for a reputation
score.  The credit bureau doesn't say "yes" or "no": it provides a risk
metric which the bank uses to make a decision.  That metric comes from
the applicant's history, run through some statistics.  What comes out is
a credit rating number somewhere betwen 0 and 100.

Today, DNSBLs and RHSBLs give the "yes/no" answer, which is
coarse-grained.  Coarse grain can be blamed for false positives.

I see RHSBL reputation systems of the future providing richer data:

   domain: yahoo.com
   born:   199501
   total:  4.3E12 reported messages
   spam:   1.2E3  reported messages
   ratio:  2.8E-10

   domain: superspammer.net
   born:   200303
   total:  6.3E7 reported messages
   spam:   3.4E7 reported messages
   ratio:  0.53

Lots of other numbers are possible, like running averages for the last
day, week, month, etc.

This lets SMTP receivers (at either the ISP or the per-user level)
construct rules like:

 if sender is whitelisted: accept
 if ratio > 0.40:          reject
 if ratio > 0.20:          save to spam folder

 if age < 3 days and total < 1000 messages: greylist
 if                  total < 1000 messages: content-filter

Orson Scott Card's short story "Investment Counselor" comes to mind.

Commercial systems can provide domain-based reputation data to ISPs as a
paid service.  Reputation systems can either provide the raw numbers and
let ISPs construct rules for accept/reject/greylist/content-filter; or
they can package generally accepted rules in a kit.  Ideally, individual
users could override an ISP's default ruleset even at SMTP time.

Of course, the reporters need a reputation system of their own, so their
votes can be modded up and down.

If you see a large sample of the mailstream, you can tell when an
essentially similar message is going out to lots of people; this is one
idea behind Razor.  Now, given a bulk mailing, if only one person votes
it as spam, we know they're being lazy, or making a mistake.  If they're
consistently lazy, they get modded down.  If a majority of the voters
agree it is spam, it probably is, and they get modded up.

If looking at message content and doing all kinds of smart hashing to
identify similarity is too much of a burden, we can get away with
counting the total message volume seen from an SPF-authenticated domain
and the number of spam complaints for that domain.  This doesn't lose
too much accuracy, and it's easier to build a distributed scheme among
a group of ISPs who trust each other and share mailstream summaries.

Different subcultures can have different opinions: then you have the
amazon.com-style "people who voted like you thought this message was
spam."  This opens the door to really fine-grained decision making.

Now, the goals of SPF + reputation systems are, in order of increasing
scope:

1) stop joe-jobs, worms, and viruses
2) enable better spam vs ham decision making, by encouraging
   blacklisting on the basis of domains, not IP addresses.
   (IP blacklists tar too many senders with the same brush.)
3) reduce false positives from good senders who publish SPF
4) make spammers more accountable, having to use their own domains.
5) make spam a losing proposition so spammers eventually give up.
6) after winning the war, to fade into invisibility.
   (Like any good immune system, SPF should work in the background.)

The reputation system is only a step along the way.  But it is easy to
get distracted by the details and forget to move the overall plan forward.

We must be careful not to entrench the intermediate stages, when
reputation systems are dominant.  Already antivirus vendors are moving
into the antispam space.  If Microsoft patched all their holes, virus
writers would lose their hobby, but antivirus vendors would lose their
shirts.  There is a structural conflict of interest here.  Let's not
attract Michael Moore's attention.

Jane Jacobs's "Systems of Survival" comes to mind.

With fine-grained decision making, after enough feedback, a reputation
system might be able to tell how you'll vote in the next election, based
simply on which political party's mail you call spam.  Smart spammers
may start asking reputation system for demographic data so they can spam
only people who want their spam.  Reputation systems which are strapped
for cash may give it to them.  Hue and cry will ensue.  Laws may be
passed.

There are a couple ways to avoid that future.

The obvious way is for ISPs to encourage a competitive market by
subscribing to more than one reputation service.

But ISPs can also get together and develop their own cooperative
distributed reputation system.  That would be nicer, because that stands
a better chance of obviating the "natural monopoly" problem.  The best
way to avoid a buildup of vested interests is to craft a resolutely
nonprofit architecture.

If twenty mid-to-large ISPs got together and shared mailstream summaries
on a daily basis, plus the results of automated content filtering, plus
the results of a simple razor or apple.com style "This is junk mail"
voting scheme, we could have a working noncommercial reputation system
up and running surprisingly quickly.

And we can do this right now, even though nobody has SPF records!
We can approximate SPF using "a/24 mx/24 ptr"; if there's a pass, it's
good enough for what we're trying to do.

Grepping the maillogs for tuples of smtp_client_ip and sender domain
would be enough to start.

-------
Sender Permitted From: http://spf.pobox.com/
Archives at http://archives.listbox.com/spf-discuss/current/
Latest draft at http://spf.pobox.com/draft-mengwong-spf-02.6.txt
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to 
http://v2.listbox.com/member/?listname(_at_)©#«Mo\¯HÝÜîU;±¤Ö¤Íµø?¡