Laird Breyer wrote:
On Sep 20 2004, Markus Stumpf wrote:
The Spammers' Compendium
http://www.jgc.org/tsc/
has a list of tricks spammers use to beat bayesian filters.
...
Clearly, this is entirely untypical of ordinary language. Like the
nonsense words, this sticks out (e.g. what percentage of legitimate
messages do *you* have that don't contain the word "the"?).
Many. Two examples : As I live in France, most messages don't contain
the word "the" as they are written in french. Also, people at our
organisation sends and receives messages in many other languages :
german, italian, russian, and even chinese ... and english of course.
In fact, many sequences will recur if a spammer sends several messages
of this type. Even without splitting on punctuation, various parts of
the "typefaces" recur, such as '888' which is used in 'n', 'o', '!'.
So the filter will automatically think messages with large frequencies
of '888' tend to be junk.
What is missing what I have seen lately is the use of e.g.
|_)
|_)|_|\/
/
_ ___
| | / (_)___ _____ __________ _
| | / / / __ `/ __ `/ ___/ __ `/
| |/ / / /_/ / /_/ / / / /_/ /
|___/_/\__,_/\__, /_/ \__,_/
/____/
.o.
888
ooo. .oo. .ooooo. oooo oooo ooo 888
`888P"Y88b d88' `88b `88. `88. .8' Y8P
888 888 888 888 `88..]88..8' `8'
888 888 888 888 `888'`888' .o.
o888o o888o `Y8bod8P' `8' `8' Y8P
to beat a bayesian filter.
...
A statistical filter will recognize all these things automatically.
Maybe, but there are many legitimate senders and even companies which
use this kind of message composition (Buy ... now) to add a footer at
all their messages. So, false positives...
In this cases, to be something acceptable, I define "ALL" as being 100%,
and "MOST OF THE TIME" as being 99.99%.
--
---------------------------------------------------------------
Jose Marcio MARTINS DA CRUZ Tel. :(33) 01.40.51.93.41
Ecole des Mines de Paris http://j-chkmail.ensmp.fr
60, bd Saint Michel http://www.ensmp.fr/~martins
75272 - PARIS CEDEX 06
mailto:Jose-Marcio(_dot_)Martins(_at_)ensmp(_dot_)fr
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg