ietf-asrg
[Top] [All Lists]

RE: [Asrg] 3. Requirements - Non Spam must go through

2003-07-11 10:55:31
"Walter Dnes" <waltdnes(_at_)waltdnes(_dot_)org>

  Welcome to...

  Walter's 1st (f)law of content-based spam detection;
  You can *NOT* use content-based spam detection against email from a
mailing list that discusses spam.

Sure you can. In fact, I do.

  Think about it for a minute.  On any spam-discussion mailing list,
it is on topic to show examples of spam.  If a spam-detector is working
properly, it *WILL* detect the spam-samples, and flag the
spam-discussion email as spam.

Think about it for a minute. If a human can distinguish between examples
of spam under discussion and actual spam, then it is theoretically possible 
for a content-based filter to duplicate that reasoning. And it turns out to
be possible in practice as well as theory.

  Corallary 1
  Content-based spam detection is imperfect, but if you *INSIST* on
using it, the best approach requires that you...
  a) absolutely whitelist any spam-discussion mailing-lists

A whitelist *is* one way to make the distinction (after all, a human says, 
"This comes from a user of the ASRG list, therefore it is something I
probably want.") But fuzzier methods work as well.

  b) do *NOT* include emails from spam-discussion mailing-lists in the
     filter's "learning" mode.

To the contrary, there are still enough differences between the two  that
a well-designed filter can learn the difference almost as easily as it learns 
the difference between "normal" spam and email. Think about the headers
in general, as well as the way that such example messages usually
get forwarded.

  Walter's 2nd (f)law of content-based spam detection;
  Even 100% correct (0% false positives and 0% false negatives)
content-based spam detection that properly flags 14 megabytes of spam and 1
megabyte of non-spam is useless given an inbox with 5 or 10 megabytes
of capacity.

Now *this* is very true. However, content-based filtering can be 
an important part of a 2-tier anti-spam solution. Spam steals both
system resources and human time. A content-based filter can
address the latter, and provide information used to 'train' the source-
based defenses that address the former.

jason


_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg