ietf-mxcomp
[Top] [All Lists]

Re: RHSBL scalability (affects MARID proposals).

2004-05-13 10:12:00

On 5/13/2004 6:24 AM, John Leslie sent forth electrons to convey:

Matthew Elvey <matthew(_at_)elvey(_dot_)com> wrote:
Has anyone here considered the issue of RHSBL scalability, compared to regular DNSBL scalability?

  I have, though not in great detail.

A simple IPv4DNSRBL (that just answers yes/no for a listing) can, absolute worst case, store its DB in 512MB (2^32 bits) of RAM.

  Indeed, these are often downloaded in their entirety.

If we assume spammers will churn through $5 domains in very high volume, and notice if any of them get expired by BL maintainers, so they all need to stay in BLs, how big a BL are we talking about?

  Something significantly smaller, IMHO -- otherwise the registrars
would get filthy rich. ;^) Spammers are nothing if not cheap.
From http://www.dailychanges.com/:
Changes are being tracked for (.COM, .NET, .ORG, .INFO, .BIZ, .US, .WS)
Today's Date 5/13/2004 9:51 AM PST

Status of domains on 5/13/2004
All 38,977,593 Total Domains
28,306,930       .COM
4,656,800        .NET
2,942,751      .ORG
1,161,175           .INFO
979,265           .BIZ
769,034              .US
So, let's wild guesstimate 100 million domains worldwide, ~18 bytes long, half are spammers.
900 million bytes, uncompressed.

Reality check: could half be spammers'? That's $5 * 50 million. They don't have that much cash.
So it's way less than 900 MB.  :)
Factoring in increased registration by spammers and some need to list third and fourth level domains in various areas of the namespace (e.g. *.studentdorms.oxford.ac.uk vs. *.cs.oxford.ac.uk), we're still seeing something that can fit in memory on a nice machine. Disk seeks on every lookup is what I'm thinking needs to be avoided.

  In practice, I expect reputation services to run more like whitelists,
returning some default response (perhaps recommending a temporary error)
when they first hear of a domain, slowly maturing it to a good rating as
time passes without complaints.

This is relevant because all the proposals (SPF explicitly and other proposals at least implicitly - DMP, RMX, FSV, DK, CID, and CSV) require RHSBLs to work.

  We don't require any particular mechanism, so it's really not "RHSBLs"
that must work. But there are some real issues here. (I skillfully
glossed over the question of how many third-level domains they'll have to
deal with...)

  The problems which do worry me aren't those of scaling (hundred-gig
disks are _so_ cheap), but those of near-monopoly ISPs "trusting" only
reputation services which demand a fee for whitelisting. Still, that
would be easier to deal with than my experience of being blacklisted
by Verizon...

--
John Leslie <john(_at_)jlc(_dot_)net>