Re: [Asrg] 2a. Analysis - Spam filled with words

On Tue, 09 Sep 2003 00:08:12 -0400, 
Yakov Shafranovich <research(_at_)solidmatrix(_dot_)com>:

I started getting weird spam samples in the last few 
days. The spam message consists of words, one after 
the other, with an image in the middle. Looks like 
another attempt to defeat the filters, here is a sample:


As Jose pointed out in his reply, these random invisible words do 
serve to "add bulk"--although any random text (even nonsense words) 
would serve that same purpose.

I have a pretty strong hunch about what these messages are trying to 
do.  Specifically, I think they're a clever attempt (by someone who 
doesn't really understand statistical language processing) to sneak 
past Bayesian classifiers.  And they succeed, the first time or two; 
but by the third time, the Bayesian classifer's identified at least 
two "tell-tale giveaways" that make these messages very easy to 
"spot" for any statistically-based technology (including mine).

On a unrelated note: I've agreed to try to help corrdinate the area 2 
analysis work for an indeterminately short time.  One of the things I 
would really like to do is to run a quick "pilot" study (and I pretty 
much don't care about what).  This study should be small, tightly 
focused, and (ideally) something that could be accomplished in, say, 
6 weeks or so.  The primary goal of this pilot would be to help folks 
working in area 2 to discover what (if any) unique mechanical 
requirements there may be to conducting an anti-spam research 
project.  (Think of it as a "shakedown run.")

Ideas for a possible focus for this pilot study are actively 
solicited.  My preference would be for folks to email me off-list 
with ideas, brainstorms, etc.  I will summarize, and then post the 
summary to the list.

- Terry



_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg