ietf-asrg
[Top] [All Lists]

[Asrg] Automated public whitelist

2011-03-31 21:44:03
Since 2006 I have been involved with dnswl.org, and frustrated with its
lack of automation.  It's fine for Matthias Leisi to run his project as
he sees fit.  I'm finally trying to resolve my frustration by creating a
fully automated public, free whitelist.  And including some blacklisting
data as a side-effect.

It works, it's fully automated.  The results are looking great for the two
people providing data.  Now I just need data from more people to improve
its usefulness to others.

http://www.chaosreigns.com/iprep/

There are SpamAssassin rules which pull data via DNS, in the usual
fashion.  I think, long term, I'd prefer to provide the data only via
rsync, because I think it would require less bandwidth (especially for
IPv6).  There's also a text file with the full set of aggregated data:
For each IP address, the percentage of email which has been ham, and a
count of the total emails seen.  Oversimplified a little.  The percentage
is normalized like SpamAssassin's S/O score, data is weighted by
recentness, and the count is a logarithm.

The data:
http://www.chaosreigns.com/iprep/iprep.txt

Of the 85,002 IPs I have data for, 99.966% of them are 100% ham or 100%
spam.  So I have excellent information on what the next email from them
will be.

To contribute data, create a mail folder containing only ham, and another
containing only spam, say ~/mail/ham/ and ~/mail/spam/ and, assuming
you're using maildir, run (from cron):

iprep.pl ham:dir:~/mail/ham/ spam:dir:~/mail/spam/

If using mbox, replace "dir" with "mbox".  These are SpamAssassin
mass-check style "targets" - more info on the web page.

There's also an option to pipe an email to its STDIN, queuing data for
later upload.

For both methods, the data is uploaded via rsync, which you'll need to
email me to get an account for.  I strongly prefer that you not email me
from a freemail account (gmail, yahoo, hotmail, etc.), just to try to make
it a little harder for spammers to feed me bad data.

I'm mostly interested in getting more ham.  I'd also appreciate hand
classified spam that was sent to real email addresses.  I'm not terribly
interested in extremely high volume spam traps - while I recognize they are
very useful for blacklisting, I'm happy to leave that to others.

My plan for IPv6 is to aggregate to some size net blocks.  I just set up
IPv6 on my server, so there are a couple of its IPs in the data, but
I haven't done the aggregation yet.  This is a huge case where I think
providing the data via rsync, for cidr blocks instead of individual IPs,
is a good idea.

-- 
"The most elementary and valuable statement in science, the beginning
of wisdom is: 'I do not know'." - Data, ST:TNG 2x2 Where Silence Has Lease
http://www.ChaosReigns.com
_______________________________________________
Asrg mailing list
Asrg(_at_)irtf(_dot_)org
http://www.irtf.org/mailman/listinfo/asrg

<Prev in Thread] Current Thread [Next in Thread>
  • [Asrg] Automated public whitelist, darxus <=