ietf-asrg
[Top] [All Lists]

Re: [Asrg] Re: 2a. Analysis - Spam filled with words

2003-09-11 08:01:39
On 2003-09-11 14:13:50 +0100, Andrew Akehurst wrote:
Can I suggest a subtly different approach? Rather than trying to
characterise spam, why not try and characterise your legitimate 
messages and see if incoming messages match that statistical
profile?

My reasoning is based on the fact that the profile of spam 
undergoes sudden shifts as spammers switch to using new tactics 
each time their old ones become less effective. Whereas, in my
case anyway, the profile of the legitimate mail I receive is 
much more stable.

Bayesian classification systems have to undergo training in order
to learn what spam and "ham" look like. But because "spam" keeps
changing, so re-training is needed over time. As time passes, the
class of spam will grow and become less clearly-defined because
the range of tactics used by spammers seems to increase. As the
definition of "spam" becomes fuzzier, does the accuracy of
filtering decrease?

Does this make a difference? Baysian-like filters just sort mails into
two buckets, and you train them by telling them which messages belong
into bucket A and which belong into bucket B. The filter doesn't care
which is ham and which is spam (theoretically. In practical
implementations, there is a bias to make false negatives more
likely than false positives but I don't think this changes the symmetry
of the algorithm fundamentally).

        hp


-- 
   _  | Peter J. Holzer    | Humor ohne Emoticons ist trockener Humor.
|_|_) | Sysadmin WSR       | 
| |   | hjp(_at_)hjp(_dot_)at         | -- Toni Grass in aip
__/   | http://www.hjp.at/ |

Attachment: pgpztX9If5W7M.pgp
Description: PGP signature