Re: Re: DNS load research

On Fri, 2005-03-25 at 07:36, Frank Ellermann wrote:

Ralf Doeblitz wrote:

 [mx check]

Of course you do - unless you are willing to accept mail
from senders that you may not be able to send an DSN to.


Then Andy's idea with a weight 1 for any mx mechanism was
correct.  And my idea to add the number of MXs per q=mx for
some kind of "overall DNS query count" was probably wrong.


I think we're both right.... see below.

An example where Radu's limits are really more restrictive
than Wayne's limits would be nice, when does the reply for
a q=mx _not_ include the relevant IPs ?


I think part of the problem is that even the same person is not talking
about the same kind of measurements, there are bunch of different ways
to interpret even the label "query count"; I count at least three
different way it has been used:
     1. number of calls to a (libc?) resolver API
     2. number of RRs returned from a single DNS query packet
     3. number of DNS packets going out (or coming in) the public
        interface

Which one is used in any given measurement seems to be entirely
dependent on the exact implementation of both the name server and the
resolver library.

Where the cache is implemented is also unknown.  A resolver API could
cache results in your process's address space, and avoid going to the
local name server -- when glibc reads /etc/hosts during a gethostbyname
call, it caches the results so it doesn't need to parse the file again. 
The local recursive name server could do caching too.

Radu has specifically mentioned a function named "ns_resolv" -- I am
personally not aware of which API that is a part of, but my take was
that was of type 1 in the above list.

I don't do much DNS resolving code writing, but from looking at the
functions available in resolver(3), I think SPF evaluators are more
likely to call a function with a prototype of (name, type) and get back
a list of entries that match only the specified type rather than one
that decodes DNS formatted packets (the former, being more abstract, is
more futureproof, expandable and scalable, which may play into someone's
choice of API).  This is just my opinion, as a programmer in general,
not as someone who as written an SPF evaluator.  The fact that the
actual query result packet may include additional records is beside the
point, because the SPF evaluator never sees them -- but the local
caching name server does and may add them to its cache which is then
served out of on subsequent requests (this is the whole point of the
additional section -- there's a good chance that if you are asking for
MX, you're going to need the As for those also, so I'll include them). 
So the SPF evaluator does:

        mxes = resolve('pobox.com', 'MX');
        foreach mxrr of mxes
            resolve (mxrr, 'A')

For pobox.com this is a total of 9 calls to my hypothetical resolve()
function.  Any given DNS query for MX may or may not result in
additional A DNS queries to resolve the contents of the MX if the local
DNS didn't already get populated with parts of the results from previous
queries.

For example, you should never have to actually send a packet for a DNS
query for NS records after performing some other query because the NS
records are always included in the additional section, and should be in
the local name server cache.

Compare what's returned by "dig mx pobox.com" and "dig mx
leave-it-to-grace.com".  Note the contents of the "additional section".

For whatever reason (I suspect packet size), A records for the pobox.com
MXs _are_not_ included in the response.  leave-it-to-grace.com's
response _does_ contain the A record for the sole MX.

Even more interesting, "dig mx hotmail.com".  All the A records for the
given MXes are included in the "additional section", hotmail has fewer
MXes, but they expand to a huge list of As.  But, again, size must be
the issue, and thankfully, the additional section doesn't appear to be
populated unless all the expanded records would fit (so no partial
expansion of MX to A).  Note that MXes are relatively expensive to
encode because they are names.  The FQDN for pobox.com MXes are upwards
of 16 bytes, in which at least 3 IPs will fit (with a byte for their RR
type).  So the more MXes you have, using Radu's load calculations, the
more load you you put on your own name servers (even in the typical
non-SPF-doom-attack atypical case), and the greater the chance of you
having a failure (if failures are somehow correlated to load).

I'm sure this is all documented in a DNS rfc somewhere. :)

This all makes me, in part, think that being as concerned as we have
been with DNS load for a single application (SPF) breaks the abstraction
that using resolver libraries and interfacing with caching name servers
affords us.  If SPF causes DNS load to go up, then what is being
abstracted (behind the local resolver library that SPF evaluators never
see) can be optimized without breaking any of the applications that are
using that abstraction.  Certain DNS query optimizations that SPF
evaluators can make to bypass the provided abstractions does not lend
itself well to good partitioning in the design.  This is a trade off.  I
think that the existence of the DNS name resolving abstractions means
that we _can_not_ (by definition of abstraction) get meaningful numbers
of load based on simple counts of queries or API calls, which is why I
think a lot of this load discussion is academic.  I still assert that if
DNS load is worse from actually _using_ DNS in the way it was designed
(and nothing in SPF seems to go against DNS usage design), then DNS is
broken (in that it doesn't scale as well as we thought it would) and
will be fixed.  This is independent of SPF.

-- 
Andy Bakun <spf(_at_)leave-it-to-grace(_dot_)com>