Re: [Asrg] Meaningless words in spams

At 6:01 PM +0000 2/16/04, Matt Schneider wrote:

At 12:28 PM 2/16/2004 -0500, you wrote:
so, i guess a sliding window would catch filter-busting headers and trailers.
No, they add a bunch of garbage right in the body of the spams too,fake HTML tags or text that's the same color as the background.
There's no real way to avoid this stuff.

Those are both really quite easy to catch, and can even be caught byautomatic learning filters. For example, the word 'oblivity' insideangle brackets (i.e. a bogus HTML tag) occurs nowhere at all in anyof my legitimate mail of the past year. It occurs 6 times in my spamof 2004. A filter that checks for strict HTML compliance in HTML mailwould have caught all of those, and I see in my current set ofBayesian classifiers that this 'word' (complete with <>) is part ofwhy the later spams containing it were marked as probable spam.Similarly, text that is the same color as the background is aprogrammatically detectable trick, and there are already filters inuse that detect it as spamsign.

I also note in peeking at my current Bayesian classifiers that thereare many perfectly valid but uncommonly used words there which seemto be strong spamsign for no obvious reason. At least no obviousreason until I look at where in my recent spam they have appeared:the filterbusting attempts that use random dictionary words. A quickbrowse of the 200k entries in my filter collection and the accuracyit shows leads me to believe that the spammers who try to breakfiltering are still losing the arms race and may not ever hit on awinning tactic.

--

Bill Colebill(_at_)scconsult(_dot_)com



_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg