ietf
[Top] [All Lists]

Re: Careful with those spamtools.....

2003-09-14 17:48:16
Indeed. These open relay blacklist sites were always a highly questionable
source for mail filtering. Quite obviously, open relays have no relationship 
to
spam...

Agreed and already in public domain:

http://www.imc.org/ube-relay.html

A similar criteria conceptually more correlated to spam filtering, would be a 
blacklist of relays that are dishonest about the previous IP address in the 
Received header chain.  I do not think http://www.rfc-ignorant.org/ currently 
databases such non-compliance.

Then again such a hypothetical database would be mostly useless in 
implementation, because dishonest proxies come and go faster than we could 
database them.  Could test in real-time, but tests can be lied to.

There are (some proprietary) reliable way to detect the dishonest proxies, but 
I agree with Dean, much better to just detect the spam directly.

In terms of detecting spam directly, per message filters which are based solely 
on content, have such as high false positive cost and are subvertable with 
content:

http://citeseer.nj.nec.com/androutsopoulos00learning.html (See Page 9 of the 
PDF linked at top)

Filters based on bulk correlation (DCC) of content, require whitelist 
maintenance and are subvertable with content.  Filters which required your 
senders to opt-in are inherently expensive to the email system, as well as 
generate many false positives, and are subvertable by forged headers (not to 
mention being patented).  A brief taxonomy is here:

http://www.imc.org/ube-sol.html

Even if these above filter types haven't been subverted in high rates yet, they 
can be:

http://www1.ietf.org/mail-archive/ietf/Current/msg22190.html

We are working on a filtering mechanism which does not suffer from these sorts 
of issues, because it actually looks at what it unique about spam, not just 
some sometimes correlated side effects as other filters above do.

I agree with Dean and I think conceptually that ALL existing anti-spam (that is 
currently in public domain that I am aware of) is useless and even harmful as 
Dean points out (in long run) because they filter things which are not spam, 
just sometimes (even if most of time so far) correlated to spam.

I've been making points like this for a long time:

http://ixazon.dynip.com/pipermail/nilsimsa/2002-December/000041.html
(my warnings on dangers of Bayesian anti-spam filtering, which imo caused Paul 
Graham to eventually add a disclaimer to his web page)


Shelby Moore
http://AntiViotic.com