[Asrg] An entirely incompatible approach for IPv6 DNSBLs

I'm coming to the conclusion that we can't fix the problem with v6 DNSBLsby applying band-aids to v4 DNSBLs.


So here's a 100% incompatible design, intended to play well with the DNS.

Regards,
John Levine, johnl(_at_)iecc(_dot_)com, Primary Perpetrator of "The Internet for 
Dummies",
Please consider the environment before reading this e-mail. http://jl.ly

-- snip --

Here's a rough draft of a new, improved idea I have to do DNSBLs and DNSWLs
for IPv6.  (It should work for v4, too, if anyone cares.)

This is basically an implementation of B-trees using the DNS as the
storage device.

Assumptions:

* Handle arbitrary mixtures of CIDRs and individual IPs.

* Must work well with existing DNS servers and caches.  The number of
  different queries should be limited, both to limit cache-server
  traffic and the number of cache entries it uses.  I assume that
  client-cache queries are cheap, cache-server queries are much more
  expensive.

* Doesn't assume senders will be well behaved.  In particular, bad guys
  may use a new IP for every message, hopping around within each /64 (or
  whatever) in which a zombie lives, and there will be many zombies active
  at the same time.

* Doesn't assume MTAs remember query history from one lookup to the next,
  so the necessary info should be available in the DNS cache.  (If they
  do, that doesn't hurt.)

* Should be possible to make it work with DNSSEC, but doesn't depend
  on DNSSEC

* No attempt to be compatible with existing DNSBLs.  I expect it will
  mostly be served from something like rbldnsd, although it would also
  be possible to have an external program create the zone files and
  serve them from something like BIND.

Each DNS record in this scheme is a blob of bytes containing multiple
self-describing logical records, each of which is a CIDR range, which
might be a single IP if the CIDR is a /128.  In each blob, the CIDRs
are in address order from lowest to highest.  The name of each record

is the lowest address. In this context, there is no advantage to doingthe nibble reversed naming, and the name is just 32 ASCII characters, theaddress as a hex number. One blob is the tree root, which as an exceptionis named 00000000000000000000000000000000 even though it probably doesn't

contain a CIDR starting at ::.

The obvious way to represent a blob is as a TXT record, with all thestrings concatentated. The more entries you can fit in a blob, the betterthis will work, so although it would be possible for each logical recordto be a separate string in the TXT record, I don't propose to do that.

Each blob contains a set of CIDRs that are in the DNSxL. For blobs thatare not leaves (there's a per-blob flag, described below), the idea isthat the records in the blob also partition the address space, with theCIDRs between the ones in the current blob in sub-blobs, each named bythe records in the current blob.


To minimize edge cases, the root blob always contains the lowest and
highest entries.

So to do the search:

1. Make the root blob the current blob and fetch it.

2. Look inside the current blob for a CIDR that contains your target
   IP address.  If you find it, stop, the BL contains the IP.

3. If your IP is lower than the first entry in the current blob, or
   higher than the last entry, stop, the BL does not contain the IP.

4. If this is a leaf blob, stop, the BL does not contain the IP.

5. Find the CIDR in the current blob just below your IP.  Use the address
   of the base of that CIDR as the name of new blob, fetch that blob,
   and make it the current one.

6. Go to step 2.

It should be evident that this is analogous to a binary tree search,
except that each node has a lot more than two descendants.

Details of blob representation:

The first byte of each blob is a flag:

  L P P P P P P P

The L bit is set if this is a leaf.  The PPPPPPP is the prefix size.
If all of the CIDR addresses in the blob have the same initial bits as
the name of the blob, which is fairly likely down toward the leaves,
this is the number of common initial bits.  The prefix bits are
logically prefixed to each CIDR address in the blob.

After that is some number of CIDRs:

  X S S S S S S S
  Address

X is reserved for now, perhaps allowing multiple kinds of entries.

SSSSSSS is 0-127, the CIDR size minus one.

The Address is 128-P-S bits long, rounded up to a byte.  That is, it
omits the prefix bits which are obtained from the name of the blob,
and the suffix bits beyond the CIDR size.

For example, say an entry is 2001:1234:5678:9123::/64, and the prefix
size is 16.  Then the entry would be (in hex)

3f 12 34 56 78 91 23

The 3f is the CIDR size (64-1), the 2001 is omitted since that's the
prefix, and the rest is the address.

Each blob is as big as will fit in a DNS answer.  If you don't believe
in EDNS0, the limit is about 450 bytes.  If you do believe in EDNS0,
it's whatever size you think clients ask for, probably in the 1K to 2K
range.

Estimated performance:

I'm guessing that with EDNS0 you can probably get about 100 entries
per blob.  That means a two-level tree is 10,000 entries, a three
level tree is a million entries, and a four level tree is 100 million
CIDRs, with about a million blobs.  Large v4 DNSBLs like the CBL have
upwards of seven million entries now, so this should be manageable.

The number of queries for any particular lookup is the number of
levels, which is unlikely to be more than four.  The cache behavior
obviously depends on both the layout of the entries and the query
pattern, but this avoids some obvious worst cases.  If a /64 is either
entirely listed, not listed at all, or just has one or two /128s, all
queries for addresses in that /64 will fetch the same four records.

Topics left to nail down:

* Efficiently turning a list of CIDRs into blobs.  I think I have a pretty
  good handle on this, estimate the depth of the tree, rounding up, and
  then fill the tree walking down and up from the root.

* Update management.  I think that you set the TTL to something
  reasonable, 15 minutes or an hour or whatever, and whenever you
  change it, you leave the old entries around for one or two TTLs,
  perhaps changing those entries TTLs to something smaller to
  discourage caches from keeping stale data.

* Compression schemes. There might be better ways to do prefix
  compression, e.g., a per-blob field that says how many bits are the
  same as the previous blob.  For the blobs, could do bit rather than
  byte aligned entries in the blobs, although I expect that would be a
  lot more work for minimal extra compression.  There may be clever tricks
  to allocate records to blobs to maximize the size of the prefix.
_______________________________________________
Asrg mailing list
Asrg(_at_)irtf(_dot_)org
http://www.irtf.org/mailman/listinfo/asrg