Running some analysis on the available data on www.spamarchive.org showed
that about 15% of the data there is actually not spam but the usual mailing
lists, joke forwards, virus mail, etc.
Although the remaining 85% are somewhat valuable, it is difficult to use
automated tools on this data.
Are there any efforts (or intentions) to eventually clean this out?
Asrg mailing list