procmail
[Top] [All Lists]

Re: Keep getting subborn spam with random words

2004-03-09 23:04:10
There are 91 elements in HTML 4.0.  There exist a few more "proprietary"
elements used by NetScape and IE; but all together probably not much more
than 150 elements in common usage.  Wouldn't it be possible to do
something like:

:0 B
* </
* ! (elements in a list)
dumpit

Ie, the body contains the structure "</something" but it's not one of the
"white list" of known html (ending) elements.

Of course "elements" like <flame></flame> or <rant></rant> would need to
be added to the "white list." :)

                                - fleet -

On Tue, 9 Mar 2004, Bob George wrote:


Skip Montanaro wrote:
[...]
In the Spambayes project this stuff is called "word salad".  (I doubt the
term originated with us.)  The one conclusion we've reached so far about it
is that it generally doesn't bother the accuracy of our classifier.

I've been following similar discussions on the spamassassin and
bogofilter lists. The general consensus is that the use of random words,
or even strings of text from classical literature, actually helps mark
the mail as spam. Afer all, I'm not likely to have strings of classical
literature in my mail, while a lit major wil probably not have as much
technical jargon. Once trained, the bayes tools do a good job
recognizing whats out of place, rather than what's "wrong".

A fixed filter isn't necessarily impossible, but it will be tricky to
avoid false positives.

- Bob


_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail



_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail