I intend to do a little project where I send a lot of spam[1] through a large
number of mostly commercial[2] spam-filters (which I'm doing anyway) and then
look at differences between spam that's caught by all filters, spam that is
misidentified by one filter and spam that is misidentified by more than, say,
25% of the filters. All with the purpose of finding where spam filters can be
improved.
Things I want to look at include the location of sender's IP, the character se,
the size of the body, the presence of an inline image (or attachment in
general), SPF[3] and whether the message is caught when it is resent after an
hour/day/week. (The latter to see if it's just a matter of
signatures/blacklists not updating fast enough.) Feel free to suggest more
things to look at, or make general suggestions for the project. I'm also happy
to hear the suggestion not to run (or publish) the research at all. I am aware
that this could also give spammers some insight in which techniques are more
likely to evade filters.
Thanks.
Martijn.
[1] Spam in the context of this email is spam sent to spam traps. So the real,
proper spam, not the perhaps-not-100%-CAN-SPAM-compliant spam.
[2] Several of these make use of open source filters (e.g. SpamAssassin), so
it's fair to say that most filters are covered. The setup does exclude
techniques such as TCP fingerprinting or greylisting though.
[3] I would love to include DKIM, but I can only distinguish between does have
and does not have a DKIM-signature; the redacting of emails to hide the
original recipient makes me unable to decide whether a present signature was
actually valid.
Virus Bulletin Ltd, The Pentagon, Abingdon, OX14 3YP, England.
Company Reg No: 2388295. VAT Reg No: GB 532 5598 33.
_______________________________________________
Asrg mailing list
Asrg(_at_)irtf(_dot_)org
http://www.irtf.org/mailman/listinfo/asrg