procmail
[Top] [All Lists]

RE: Keep getting subborn spam with random words

2004-03-10 10:29:31


From: Jay Moore
Sent: Wednesday, March 10, 2004 8:54 AM
[...]

But again, including these "word salad"/"zombie"/"hammy" inputs in the
training corpus for a Bayesian classifier will have an effect on how it
performs. It would be snake oil to claim otherwise.


It is not necessarily true that these "word salad" tactics decrease
the effectiveness of the Bayes algorithm. Several of the Bayes
implementations
use word pairs, or a simple sliding window of n-character "words" paired
together, as the look up token. If the spammer chooses commonly occurring
words, but arranges them in pairs that don't occur often in normal
language use, then this signature will register as a spam signal.
Of course, it isn't often the case that these pairs will occur again, so
they will eventually be aged out because they are encountered infrequently.
Some implementors have suggested biasing new words slightly in favor of
spam rather than ignoring them or neutral weighting them.

It is probably worth noting that if the spammers ripped off actual
web page content, or replicated communications (without credit) they
might be seen as committing a copyright violation, which would give the
motivated plaintiff some extra leverage if they decided to act
against the spammer. That said, usurping the resources of unsuspecting
compromised PC owners doesn't seem to give them much pause.



_______________________________________________
procmail mailing list
procmail(_at_)lists(_dot_)RWTH-Aachen(_dot_)DE
http://MailMan.RWTH-Aachen.DE/mailman/listinfo/procmail