Professional Software Engineering wrote:
[...]
Some simply use random runs of English words.
That's where bayes is particularly powerful. It doesn't matter what
words they use, but rather how they compares to your normal/spam patterns.
Note that "Hungarian notation", a variable-naming scheme used widely
in windows-based software development (though certainly not restricted
to it), can easily trip a consonant-weighing filter.
Exactly. Some of the obfuscation detection rules play merry hell with
code. Bayes (depending on how implemented) doesn't really care WHAT the
letters/words are, just what the patterns look like. If somebody posts a
passage from Dante's Inferno along with their spam, it sure won't look
like C keywords (and vice versa).
[...]
Of course, if you don't discuss code in email, then these issues might
not present themselves, but I humbly submit that anyone writing code
to weigh consonant:vowel ratios should definatley run it against
source code, in various programming languages, to see how it will react.
Yes, fixed patterns are always likely to be error-prone -- one way or
the other. Not to say that they're not useful, but there's no
one-size-fits-all filter based on words or predictable patterns.
The same applies to virus detection (as opposed to simple filtering.)
- Bob
_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail