Vernon Schryver wrote:
From: John Stracke <jstracke(_at_)centivinc(_dot_)com>
That would be less somewhat useful in this case, though, since each user
has their own table of keywords.
That contradicts other assumptions about this mechanism
Whose? The author of the original article was very explicit that he was
advocating users have individual tables.
and it points
out a major problem. One assumption is that spam is a more rather
than less uniformly distributed flood. If it is not uniform, how can
you hope that the statistical characteristics of previous samples will
be related to future samples from new spammers? If spam is uniform,
then why do users need private tables of keywords?
I think it was to reduce false positives--because the profile of
different users' legitimate mail is nonuniform.
The major problem is that the mechanism requires a significant and
continuing false-negative rate to keep the scoring tuned as spammers
come and go.
I dunno; keeping the tuning up to date sounds like a strength to me. It
requires some level of effort, but a much lower level than deleting
every piece of spam by hand.
Of course, the main problem with any and every such system is that it
is looking for characteristics other than "unsolicited" and "bulk."
Yes, and the main problem with the DCC is that it does not.
When I moved last fall, I went through old mail, harvested the addresses
of old friends, and sent out mail with my new address. Some of these
people had never received email from me (they and I were CC:ed on the
same messages from other friends), so I would not have been on their
whitelist. I don't know how many people I sent to, but it was certainly
more than 10--which you say counts as bulk. So, if at least 10 of those
people had been using the DCC, then my message would have been tagged as
UBE, and some of them would not have gotten it. I suppose one might
argue this message was bulk email, but I knew every one of those people
personally, considered them friends (even if I hadn't seen them since
college), and had reason to believe that they would be at least somewhat
pleased to keep track of me. Why should that be filtered?
I'm not advocating the Bayesian approach as a silver bullet, mind you;
but I think it's an interesting area to look into. Even if it doesn't
work, the general idea of filtering based on personalized statistics
could lead to something that works better.
--
/===============================================================\
|John Stracke |jstracke(_at_)centivinc(_dot_)com |
|Principal Engineer|http://www.centivinc.com |
|Centiv |My opinions are my own. |
|===============================================================|
|Both candidates are better than a megalomaniac mutant lab mouse|
|bent on world domination...but it's pretty close. |
\===============================================================/