ietf-asrg
[Top] [All Lists]

[Asrg] Statistical Analysis shows SPF should work Pretty Well

2003-06-12 18:45:43
Executive Summary:

   Matching sender domain with client IP is a strong predictor of spamminess.

   http://dumbo.pobox.com/spam-sensor/analysis01.png

Analysis:

I analyzed 6,810,374 unique deliveries over a two-month period whose
senders claimed to be from aol.com, hotmail.com, and yahoo.com.  Those
deliveries came from 1,885,248 distinct email senders.  I classified
those senders using statistical methods into 1,775,660 spammer
addresses and 109,588 nonspammer addresses.

Of the 1,775,660 addresses which my classifier decided were more
likely to be spammers than not-spammers, 4,188 actually originated
from aol, hotmail, or yahoo.  That is a statistically insignificant
number and reflects more on the imperfection of my classifier scheme
than anything else.  The classifier scheme is described at
http://dumbo.pobox.com/spam-sensor/.

Conclusion 1: aol, hotmail, and yahoo have successfully implemented
outbound antispam technology, ie. ways to ensure that only humans sign
up for their accounts, or limits on per-account outbound message volume.

The analysis is described in detail at 
http://dumbo.pobox.com/spam-sensor/analysis01.txt

The important result of the analysis is a log/log scatterplot
   http://dumbo.pobox.com/spam-sensor/analysis01.png

Each dot represents one or more sender addresses; the color of the dot
represents whether the domain matched the client IP --- sort of a
proto-SPF, using PTR instead.  There is a collision problem but on the
whole the output communicates pretty well.

Conclusion 2: Client IPs whose PTR do not match their sender domains are
more likely to be spam than not.

But that means a scheme like SPF/DMP/RMX should work nicely.
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg