[Asrg] Re: On the need for reliable, objective data

[List volume too high; I've switched to digest mode]

From: "Gary Feldman" <gaf(_at_)rtr(_dot_)com>

I've seen a number of posts containing statements of the
form "most spam does X", "most spammers do X", "most of
the cost is X", "many users need X", etc.

Does the ASRG have a need for more reliable data to
either confirm or refute statements of this sort?


Yes!

If so, there are at least two tactical questions:

How should such data be managed, stored, reviewed,
made available to the public, etc?


I think a mechanism similar to DCC could be good.  We need a way for
lots of sensors to dump information into the collection network.  We
need a way to decide what summaries of the data are useful.  And we need
a way to extract the data without compromising privacy.  (e.g., report
SHA1 hashes of addresses rather than addresses themselves.)

What data is needed and how should it be collected?


I think some useful data would be the following:

- Number of RCPT commands from a given IP
- Number of RCPT commands from a given IP that fail due to unknown recipient
- Number of DSN's that bounce, correlated with originating IP.  It might
  be dangerous to put too much stock in this statistic, as others have
  pointed out.
- Number of IP addresses from which a "substantially similar" message is
  received within the last x hours.  (Getting the same message from a
  lot of different IP's might be an indication of spam.)  Collecting
  this data should be a simple modification of Razor or DCC.

We have to look at what we, the recipients, control, and only trust that.

The spammer controls:

- The sender address, so that's worthless as a filtering tool.
- The message headers and body, so checksum schemes can be defeated
  relatively easily, as can most content-filtering techniques.  (Sorry,
  guys; them's the breaks.)
- The originating IP address.  This is pretty difficult to fake.
- The reverse-DNS of the originating IP address.  Depending on the situation,
  this might be very easy or a little expensive to fake.  If you're spamming
  through thousands of open-proxies, it's probably more trouble than it's
  worth to try to fake the rDNS entries for them all.  If you're sending from
  your box, it might be worth your time to get it to look like
  smtp-out.yahoo.com.

We control, or can conceivably control:

- The response from our SMTP servers.
- Our own e-mail addresses.
- To some extent, the spammer's mailing list, via responsible poisoning.

<soapbox>
I think any scheme that relies on things beyond our control is doomed.
My proposals (1) to look at bounce ratios across a wide proportion of the
Internet, (2) to implement a simple way to generate and use disposable
e-mail addresses, and (3) to use responsible list-poisoning, are, I think,
useful because they actually do things that are expensive for spammers to
sidestep.
</soapbox>

--
David.
_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg