Re: Why Spam is a problem

Vernon Schryver wrote:

From: John Stracke <jstracke(_at_)centivinc(_dot_)com>
That would be less somewhat useful in this case, though, since each userhas their own table of keywords.
That contradicts other assumptions about this mechanism

Whose? The author of the original article was very explicit that he wasadvocating users have individual tables.

and it points
out a major problem.  One assumption is that spam is a more rather
than less uniformly distributed flood.  If it is not uniform, how can
you hope that the statistical characteristics of previous samples will
be related to future samples from new spammers?  If spam is uniform,
then why do users need private tables of keywords?

I think it was to reduce false positives--because the profile ofdifferent users' legitimate mail is nonuniform.

The major problem is that the mechanism requires a significant and
continuing false-negative rate to keep the scoring tuned as spammers
come and go.

I dunno; keeping the tuning up to date sounds like a strength to me. Itrequires some level of effort, but a much lower level than deletingevery piece of spam by hand.

Of course, the main problem with any and every such system is that it
is looking for characteristics other than "unsolicited" and "bulk."

Yes, and the main problem with the DCC is that it does not.

When I moved last fall, I went through old mail, harvested the addressesof old friends, and sent out mail with my new address. Some of thesepeople had never received email from me (they and I were CC:ed on thesame messages from other friends), so I would not have been on theirwhitelist. I don't know how many people I sent to, but it was certainlymore than 10--which you say counts as bulk. So, if at least 10 of thosepeople had been using the DCC, then my message would have been tagged asUBE, and some of them would not have gotten it. I suppose one mightargue this message was bulk email, but I knew every one of those peoplepersonally, considered them friends (even if I hadn't seen them sincecollege), and had reason to believe that they would be at least somewhatpleased to keep track of me. Why should that be filtered?

I'm not advocating the Bayesian approach as a silver bullet, mind you;but I think it's an interesting area to look into. Even if it doesn'twork, the general idea of filtering based on personalized statisticscould lead to something that works better.


--
/===============================================================\
|John Stracke      |jstracke(_at_)centivinc(_dot_)com                      |
|Principal Engineer|http://www.centivinc.com                    |
|Centiv            |My opinions are my own.                     |
|===============================================================|
|Both candidates are better than a megalomaniac mutant lab mouse|
|bent on world domination...but it's pretty close.              |
\===============================================================/