ietf-asrg
[Top] [All Lists]

Re: [Asrg] "Uncaught spam" research project

2010-04-30 11:07:25
Martijn Grooten <martijn(_dot_)grooten(_at_)virusbtn(_dot_)com> wrote:

I intend to do a little project where I send a lot of spam[1] through
a large number of mostly commercial[2] spam-filters (which I'm doing
anyway) and then look at differences between spam that's caught by
all filters, spam that is misidentified by one filter and spam that
is misidentified by more than, say, 25% of the filters. All with the
purpose of finding where spam filters can be improved.

Things I want to look at include
- the location of sender's IP,
- the character se,
- the size of the body,
- the presence of an inline image (or attachment in general),
- SPF[3]
- and whether the message is caught when it is resent after an
  hour/day/week. (The latter to see if it's just a matter of
  signatures/blacklists not updating fast enough.)

Feel free to suggest more things to look at,

   I'd definitely record the AS of the sender's IP.

or make general suggestions for the project. I'm also happy to hear
the suggestion not to run (or publish) the research at all.

   Oh, definitely run it... The question is how much to obscure when
you publish it.

I am aware that this could also give spammers some insight in which
techniques are more likely to evade filters.

   Filters, hopefully, are a moving target; so whatever you publish
will be of limited use a week later.

[1] Spam in the context of this email is spam sent to spam traps.
So the real, proper spam, not the perhaps-not-100%-CAN-SPAM-compliant
spam.

   It will be necessary to at least sample the "interesting" cases,
since spamtraps do get some non-spam...

[2] Several of these make use of open source filters (e.g.
SpamAssassin), so it's fair to say that most filters are covered.
The setup does exclude techniques such as TCP fingerprinting or
greylisting though.

   That's OK, though it might be interesting to compare those
techniques. BTW are you saying that if a (commercial?) spam-filter
uses those techniques, your setup will exclude them?

[3] I would love to include DKIM, but I can only distinguish between
does have and does not have a DKIM-signature; the redacting of
emails to hide the original recipient makes me unable to decide
whether a present signature was actually valid.

   I would assume that the interesting datum is whether the DKIM
signature was valid when received, and that the DKIM signature
itself needs to be excised.

--
John Leslie <john(_at_)jlc(_dot_)net>
_______________________________________________
Asrg mailing list
Asrg(_at_)irtf(_dot_)org
http://www.irtf.org/mailman/listinfo/asrg

<Prev in Thread] Current Thread [Next in Thread>