procmail
[Top] [All Lists]

Re: Keep getting subborn spam with random words

2004-03-10 00:28:11
Professional Software Engineering wrote:

[...]
Some simply use random runs of English words.

That's where bayes is particularly powerful. It doesn't matter what words they use, but rather how they compares to your normal/spam patterns.

Note that "Hungarian notation", a variable-naming scheme used widely in windows-based software development (though certainly not restricted to it), can easily trip a consonant-weighing filter.

Exactly. Some of the obfuscation detection rules play merry hell with code. Bayes (depending on how implemented) doesn't really care WHAT the letters/words are, just what the patterns look like. If somebody posts a passage from Dante's Inferno along with their spam, it sure won't look like C keywords (and vice versa).

[...]

Of course, if you don't discuss code in email, then these issues might not present themselves, but I humbly submit that anyone writing code to weigh consonant:vowel ratios should definatley run it against source code, in various programming languages, to see how it will react.

Yes, fixed patterns are always likely to be error-prone -- one way or the other. Not to say that they're not useful, but there's no one-size-fits-all filter based on words or predictable patterns.

The same applies to virus detection (as opposed to simple filtering.)

- Bob

_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail