Re: [Asrg] Building a better blacklist


On Apr 1, 2006, at 9:32 AM, Chris Lewis wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Dan Oetting wrote:
Why are the performance figures for blacklists so low? I saw someone
post a figure of 80% blocking. While less than 50% of the spam Iget in
my unfiltered accounts would have been blocked by SBL+XBL. Why  can't
the blocking rate be in the high 90's?
Because, quite simply, a significant amount of spam comes from "mixed
sources", such as ISP mail servers, so, unless you're willing toput upwith very high FPs, DNSBLs are simply not suitable for _that_segment of
spam.

I found a combination that would block nearly 100% of the spam Ireceive. As I said earlier, the SBL+XBL would block 50% of the spam Isee. The DCC Reputation system would block the other 50%. Aninteresting observation is that I see no overlap between the two.Either every commercial DCC server is protected by the SBL+XBL or myISP is blocking on the combined score. Only one spam source that hitmy account had a mixed reputation on the DCC list and was not listedin the blacklists.

We're still 100% blocking tin.it's mail servers.


That's the one.

The main
concern I suppose is that they can't afford to loose mail sent totheir
customers. To address this, a blacklist systems could be  designed to
recover automatically when the spam stops.


Many DNSBLs already do this, virtually all of the reputable ones do.

The DCC Reputation system is almost exactly what I had been seekingfor the last four years. It's even got the distributed spamtraps anda flood distributed update network, Except that it's cycle time is onthe order of a few days instead of a few hours. And it's a closedsystem, only available to the commercial licensed sites.

The DCC Reputation doesn't appear to use any weighting based on thereliability of the spamtraps. I feel that the field of traps needs tobe very diverse which means that some of the traps are going to getmore noise than others and this should be taken into account whendeciding if a spam threshold has been reached. Also, releasing theexact hit counts for a source address is dangerous and could lead tosimple search attacks to discover the trap addresses.

If the mail is  rejected with
a 4xx response code the non-spam mail from legitimate  ISPs would be
delivered (only slightly delayed) once the spam is  cleaned up.
That's presuming that "spam is cleaned up" in a sufficiently timely
fashion to fall within retry limits.  It seldom is.  Indeed, most spam
simply doesn't retry, so, issuing a 4xx response the first timeyou seea particular spam is a highly useful technique. It's calledgreylisting.
That leads to the question of how to clean up the spam in real time.
Since the spam traps have already captured samples of the spam
emanating from the blocked ISP, it should be easy enough toconstruct a
profile or signature of the spam that the source ISP could use to
quarantine the remaining spam in the queue.
Some ISPs have done this in the past, however, it's seldom aneffectiveor generalized enough solution. Furthermore, most spam _never_sits in
queues, because it's not coming from mail servers.

Who is worried about the non-mail server sources of spam? They willbe listed for as long as they continue to try and deliver spam.

It's the "significant amount of spam comes from ""mixed sources"",such as ISP mail servers" that needs to be considered.

What I am proposing is a fast response (greylist) advisory thatallows recipients to delay the acceptance of mail from the listedISPs while signaling the listed ISP's in real time that they have aproblem and to let the listed ISPs clean up the problem at thesource. Then releasing the advisory when spam is no longer detectedso the mail will again flow freely from the now clean source. Incases where the source ISP doesn't clean up in a reasonable time therecipient ISPs would fall back on their own filtering.

Unlike the typical blacklist, this advisory list can afford to have ahair trigger because the penalty of a false listing is at most ashort delay in mail delivery. There is still a tradeoff to balancethis penalty with the advantage of stopping more spam. Whitelistsbased on sender and recipient could circumvent the delay and furtherminimize the negative impacts.

One issue is how to signal the sending ISP in real time to tell themexactly what their problem is. [If this was the single ultimatesolution to the spam problem then every ISP would subscribe to theadvisory and the problem would be solved. :^} ] Realistically, anISP that is accepting (or quarantining) mail during the advisorycould return a specific response when, for instance, the mail isdetected as bulk by the DCC. The Sending ISP upon receiving a numberof these responses for mail from the same local user would know thatthe user is probably misbehaving and could take appropriatecorrective action.

What is primarily gained by this proposed advisory list is time tomake a better determination on wether to block the listed ISP or not.And time to build a better signature of the bulk mail. The spam thatwould have come out in this time interval can be kept out of therecipients mailboxes.



-- Dan Oetting


_______________________________________________
Asrg mailing list
Asrg(_at_)ietf(_dot_)org
https://www1.ietf.org/mailman/listinfo/asrg