[Asrg] Spam detection system proposal

Hi,

I've been reading ASRG for a couple of days, and am impressed by the
message volume. :-)

However, we need practical ways to fight spammers.  I think we're spending
too much time trying to define "spam" and not enough on understanind spammers.
Here are some ideas that I believe will have real, practical benefits.

What are the differences between a spammer and a legitimate mass mailer?
How can we exploit these differences?  I see two major differences:

1) Spammers want to send out lots of messages cheaply, and don't
particularly care if any one message gets through.  Legitimate mass
mailers want all of their messages to get through.

2) This is just a hunch, but I bet it's true:  Spammers probably have a
higher proportion of bad addresses on their lists than mass-mailers.  We
can help ensure this by poisoning their lists with web pages of fake
addresses.

The analogy to IDS software is apt here.  Condition (1) can be detected
with a purely local process:  You tempfail mail from unknown senders the
first time.  (Better, tempfail based on sender-recipient pairs).  I already
do this, and it reduces spam by a significant percentage (20-25%) with very
little cost to me.

Condition (2) cannot be detected purely locally, but I have a proposal
that can make it possible to detect (2).  Just as we have central clearing
houses for checksums, we can build a system of central clearing houses
for success/failure counts.

Imagine modifying MTA software so that:

- If a RCPT TO: succeeds, it sends a note saying:  "Sender 
xyz(_at_)domain(_dot_)net
  from IP address a.b.c.d sent a successful RCPT TO: command"

- If a RCPT TO: fails, a similar failure note is sent.

- Possibly, we could augment the scheme so that mail to a honeypot address
  is noted and counts for more than a simple failure -- we could weight
  the various addresses.

The clearing house would maintain the success/failure rate over a
sliding window of 24 hours or so.

When your spam filter is deciding whether or not to accept mail, it
consults the clearing house.  Based on criteria you choose, it can add
X points to the spam score if the success rate for the IP or sender is
lower than a certain amount.  For example, if "xyz(_at_)domain(_dot_)net" has a
15% failure rate with at least 300 samples reported, then you add 3
points to your spam score.

I believe this scheme is pretty easy to implement, is practical, doesn't
lessen privacy too much, and most importantly, requires no changes to
existing Internet protocols or end-user software.  It also uses an objective
measure of spammer behaviour, rather than trying to define the undefinable.

Obviously, the implementation details are important -- we'd want to make
it hard for any one machine to skew the statistics with fake data, etc.
But I don't think this task is any harder than trying to secure DNS :-) and
we all still use and rely on DNS.

Comments welcome. :-)

--
David.

_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg