ietf
[Top] [All Lists]

Re: Why Spam is a problem

2002-08-19 08:59:20
Vernon Schryver wrote:

From: John Stracke <jstracke(_at_)centivinc(_dot_)com>

That would be less somewhat useful in this case, though, since each user has their own table of keywords.
That contradicts other assumptions about this mechanism

Whose? The author of the original article was very explicit that he was advocating users have individual tables.

and it points
out a major problem.  One assumption is that spam is a more rather
than less uniformly distributed flood.  If it is not uniform, how can
you hope that the statistical characteristics of previous samples will
be related to future samples from new spammers?  If spam is uniform,
then why do users need private tables of keywords?

I think it was to reduce false positives--because the profile of different users' legitimate mail is nonuniform.

The major problem is that the mechanism requires a significant and
continuing false-negative rate to keep the scoring tuned as spammers
come and go.

I dunno; keeping the tuning up to date sounds like a strength to me. It requires some level of effort, but a much lower level than deleting every piece of spam by hand.

Of course, the main problem with any and every such system is that it
is looking for characteristics other than "unsolicited" and "bulk."

Yes, and the main problem with the DCC is that it does not.

When I moved last fall, I went through old mail, harvested the addresses of old friends, and sent out mail with my new address. Some of these people had never received email from me (they and I were CC:ed on the same messages from other friends), so I would not have been on their whitelist. I don't know how many people I sent to, but it was certainly more than 10--which you say counts as bulk. So, if at least 10 of those people had been using the DCC, then my message would have been tagged as UBE, and some of them would not have gotten it. I suppose one might argue this message was bulk email, but I knew every one of those people personally, considered them friends (even if I hadn't seen them since college), and had reason to believe that they would be at least somewhat pleased to keep track of me. Why should that be filtered?

I'm not advocating the Bayesian approach as a silver bullet, mind you; but I think it's an interesting area to look into. Even if it doesn't work, the general idea of filtering based on personalized statistics could lead to something that works better.

--
/===============================================================\
|John Stracke      |jstracke(_at_)centivinc(_dot_)com                      |
|Principal Engineer|http://www.centivinc.com                    |
|Centiv            |My opinions are my own.                     |
|===============================================================|
|Both candidates are better than a megalomaniac mutant lab mouse|
|bent on world domination...but it's pretty close.              |
\===============================================================/





<Prev in Thread] Current Thread [Next in Thread>