ietf-asrg
[Top] [All Lists]

Re: [Asrg] Building a better blacklist

2006-04-07 23:44:04

On Apr 1, 2006, at 9:32 AM, Chris Lewis wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dan Oetting wrote:
Why are the performance figures for blacklists so low? I saw someone
post a figure of 80% blocking. While less than 50% of the spam I get in
my unfiltered accounts would have been blocked by SBL+XBL. Why  can't
the blocking rate be in the high 90's?

Because, quite simply, a significant amount of spam comes from "mixed
sources", such as ISP mail servers, so, unless you're willing to put up with very high FPs, DNSBLs are simply not suitable for _that_ segment of
spam.

I found a combination that would block nearly 100% of the spam I receive. As I said earlier, the SBL+XBL would block 50% of the spam I see. The DCC Reputation system would block the other 50%. An interesting observation is that I see no overlap between the two. Either every commercial DCC server is protected by the SBL+XBL or my ISP is blocking on the combined score. Only one spam source that hit my account had a mixed reputation on the DCC list and was not listed in the blacklists.

We're still 100% blocking tin.it's mail servers.

That's the one.


The main
concern I suppose is that they can't afford to loose mail sent to their
customers. To address this, a blacklist systems could be  designed to
recover automatically when the spam stops.

Many DNSBLs already do this, virtually all of the reputable ones do.

The DCC Reputation system is almost exactly what I had been seeking for the last four years. It's even got the distributed spamtraps and a flood distributed update network, Except that it's cycle time is on the order of a few days instead of a few hours. And it's a closed system, only available to the commercial licensed sites.

The DCC Reputation doesn't appear to use any weighting based on the reliability of the spamtraps. I feel that the field of traps needs to be very diverse which means that some of the traps are going to get more noise than others and this should be taken into account when deciding if a spam threshold has been reached. Also, releasing the exact hit counts for a source address is dangerous and could lead to simple search attacks to discover the trap addresses.


If the mail is  rejected with
a 4xx response code the non-spam mail from legitimate  ISPs would be
delivered (only slightly delayed) once the spam is  cleaned up.

That's presuming that "spam is cleaned up" in a sufficiently timely
fashion to fall within retry limits.  It seldom is.  Indeed, most spam
simply doesn't retry, so, issuing a 4xx response the first time you see a particular spam is a highly useful technique. It's called greylisting.

That leads to the question of how to clean up the spam in real time.
Since the spam traps have already captured samples of the spam
emanating from the blocked ISP, it should be easy enough to construct a
profile or signature of the spam that the source ISP could use to
quarantine the remaining spam in the queue.

Some ISPs have done this in the past, however, it's seldom an effective or generalized enough solution. Furthermore, most spam _never_ sits in
queues, because it's not coming from mail servers.

Who is worried about the non-mail server sources of spam? They will be listed for as long as they continue to try and deliver spam.

It's the "significant amount of spam comes from ""mixed sources"", such as ISP mail servers" that needs to be considered.

What I am proposing is a fast response (greylist) advisory that allows recipients to delay the acceptance of mail from the listed ISPs while signaling the listed ISP's in real time that they have a problem and to let the listed ISPs clean up the problem at the source. Then releasing the advisory when spam is no longer detected so the mail will again flow freely from the now clean source. In cases where the source ISP doesn't clean up in a reasonable time the recipient ISPs would fall back on their own filtering.

Unlike the typical blacklist, this advisory list can afford to have a hair trigger because the penalty of a false listing is at most a short delay in mail delivery. There is still a tradeoff to balance this penalty with the advantage of stopping more spam. Whitelists based on sender and recipient could circumvent the delay and further minimize the negative impacts.


One issue is how to signal the sending ISP in real time to tell them exactly what their problem is. [If this was the single ultimate solution to the spam problem then every ISP would subscribe to the advisory and the problem would be solved. :^} ] Realistically, an ISP that is accepting (or quarantining) mail during the advisory could return a specific response when, for instance, the mail is detected as bulk by the DCC. The Sending ISP upon receiving a number of these responses for mail from the same local user would know that the user is probably misbehaving and could take appropriate corrective action.


What is primarily gained by this proposed advisory list is time to make a better determination on wether to block the listed ISP or not. And time to build a better signature of the bulk mail. The spam that would have come out in this time interval can be kept out of the recipients mailboxes.


-- Dan Oetting


_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg

<Prev in Thread] Current Thread [Next in Thread>