ietf-asrg
[Top] [All Lists]

[Asrg] Spam corpuses

2003-04-10 18:56:40
Does anyone besides Spam Assassin have a decent corpus of spam for training
and testing filters?  This could be with or without non-spam.

I was searching and turned up the following ones:

[1] The spam corpuses from Ion Androutsopoulos' papers are linked from here:
        http://www.aueb.gr/users/ion/publications.html

[2] The Spambase collection:
        ftp://ftp.ics.uci.edu/pub/machine-learning-databases/spambase

[3] Spam Assassin's public corpus
        http://www.spamassassin.org/publiccorpus/

[4] Another corpus that turned up, spam only:
        http://clg.wlv.ac.uk/projects/junk-email/

[5] Grant Taylor's collection of spam
        http://www2.picante.com:81/~gtaylor/download/spam.tar.gz

[1] is the stuff used for the original Bayesian filtering papers, as best I
can tell.  Unfortunately, it's already processed a fair bit and only
contains Subject: and body. [2] seems to just be results, and [4] is also
processed and missing most of the headers.  

Does anyone else have favourite links for spam collections?  

 Terri
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg



<Prev in Thread] Current Thread [Next in Thread>