Re: [Asrg] "Uncaught spam" research project

On Fri, Apr 30, 2010 at 10:37 AM, Martijn Grooten
<martijn(_dot_)grooten(_at_)virusbtn(_dot_)com> wrote:

I intend to do a little project where I send a lot of spam[1] through a large 
number of mostly commercial[2] spam-filters (which I'm doing anyway) and then 
look at differences between spam that's caught by all filters, spam that is 
misidentified by one filter and spam that is misidentified by more than, say, 
25% of the filters. All with the purpose of finding where spam filters can be 
improved.


In my experience, you will find more variation in performance between
a properly configured / maintained spam filter and a system left at
defaults and forgotten about than you will find between different
vendors.  Filters will use a variety of tactics to detect spam but
most are common and unless a vendor has implemented them incorrectly
they should perform identically.  For instance RBLs, smtp syntax
checks, dns checks, IP connection characteristics are all going to be
common. There is variation in how the results of these things are
used, but this is often configurable and needs to be tweaked for a
particular type of site for best performance anyway.

Are you planning to compare these systems in their default
configurations?  If so your results may be more an indicator of which
vendor's defaults work best for your system than anything else.

Things I want to look at include the location of sender's IP, the character 
se, the size of the body, the presence of an inline image (or attachment in 
general), SPF[3] and whether the message is caught when it is resent after an 
hour/day/week. (The latter to see if it's just a matter of 
signatures/blacklists not updating fast enough.) Feel free to suggest more 
things to look at, or make general suggestions for the project. I'm also 
happy to hear the suggestion not to run (or publish) the research at all. I 
am aware that this could also give spammers some insight in which techniques 
are more likely to evade filters.

Thanks.

Martijn.

[1] Spam in the context of this email is spam sent to spam traps. So the 
real, proper spam, not the perhaps-not-100%-CAN-SPAM-compliant spam.

[2] Several of these make use of open source filters (e.g. SpamAssassin), so 
it's fair to say that most filters are covered. The setup does exclude 
techniques such as TCP fingerprinting or greylisting though.

[3] I would love to include DKIM, but I can only distinguish between does 
have and does not have a DKIM-signature; the redacting of emails to hide the 
original recipient makes me unable to decide whether a present signature was 
actually valid.


Virus Bulletin Ltd, The Pentagon, Abingdon, OX14 3YP, England.
Company Reg No: 2388295. VAT Reg No: GB 532 5598 33.
_______________________________________________
Asrg mailing list
Asrg(_at_)irtf(_dot_)org
http://www.irtf.org/mailman/listinfo/asrg

_______________________________________________
Asrg mailing list
Asrg(_at_)irtf(_dot_)org
http://www.irtf.org/mailman/listinfo/asrg