[Asrg] Re: 2a. Analysis - Spam filled with words

On Tue, 9 Sep 2003 13:45:51 -0400,
"Hector Santos" <winserver(_dot_)support(_at_)winserver(_dot_)com>

These are called "tag injections."   Its been around for 
awhile in HTML email.


Without meaning to seem disagreeable, I must disagree.  Hector's 
point that so-called tag injections/obfuscating comments have been 
around a while is well taken.  And he's quite correct that messages 
where the text is broken up or otherwise obfuscated are indeed 
intended to bypass simple keyword filters.

But in these "new" emails, exactly opposite is true.  The text is 
*not* broken up; on the contrary, it's perfectly intact, but "hidden" 
from *human* readers.  

The thing that makes these "new" messages different is precisely the 
fact that they do *not* contain the nonsense words/random characters 
typical of obfuscating comments.  Instead, they contain literally 
dozens of "high-end" *content-rich* words, deliberately left intact.  
That's the "tell" (a poker term) that these messages are probably 
designed to confuse statistical language classifiers.  (Again, they 
don't work, won't work--and ultimately *can't* work, for reasons that 
are interesting only to people like me.)  Admittedly based on a 
manual "training" run, the Bayesian component of my statistical 
filter started "catching" these after seeing just two of them.

- Terry



_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg