Re: Re: DNS load research

On Mon, 2005-03-21 at 14:01 -0500, Radu Hociung wrote:

This is not quite correct. When you get incoming spam, you have too look 
up MX records for domains that otherwise would have no reason to look 
up, as you don't correspond with them.

The MX is an indirect mech. Every time you see MX, be ready for at least 
two queries. One to get the list of MX mailers, and at least one to get 
the A record of the first mailer. Also, when you see MX, you have no 
idea how many lookups it will take to get to the bottom of it. You have 
to do one query to find out.

For these reasons, the MX mechanism is at least twice as expensive as an 
  A mechanism.


I'm trying to get away from the "number of queries" count and think of
things in terms of how they are different from normal MTA operation.
That is, MTAs normally have to look up MXs, the complexity behind the MX
RR is part of resolving an MX.

While, yes, you would have to look up the MX for domains that you would
not normally correspond with when resolving an SPF record that uses an
mx mechanism, this is no different than having to handle increased email
load in general.  The result of which is largely cachable.

Also, MX mechanisms really are worthless, but expensive:

When you list an MX mechanism, there can be two possible scenarios:

1. You control that MX mechanism.
    So you know all the mailers, and you should list them (by IP :) )


As I've said before, the mx mechanism eases maintenance, and if we need
to encourage use of SPF compilers or add code to DNS servers to replace
mx with a list of ip4s at query time, then so be it.  Using arguments
like "you can do easy maintenance with a makefile and macros" is
disingenuous because not all DNS information, even within the same zone,
is maintained by the same person with the same permissions on even the
same system.  There are LDAP and RDBM systems out there that store
enterprise-wide network topology information, and different departments
can be responsible for different records and parts of the tree that
eventually get served by DNS.  Guy recently said it better than I am, I
think.

2. You don't control it, it's in someone else's domain.
    So you don't know the mailers, and you're guessing.
    You've a better chance of guessing wrong than right, as in many
    installations, outgoing mail goes through different servers than
    incoming (which is what MX is for).
    We used t-online.de as an example of this guess-work.


See above about delegation of duties.  I would venture to guess that in
many cases, mx should be avoided, but that's really something for each
person to determine for their own setup.  Part of the cookbook/best
practices document could be:

        If your domain only has a single server that both sends and
        receives all email for that domain, then your SPF record could
        be as simple as:
        
                v=spf1 mx -all
                
        In other, more complex cases (multiple MX records, sending and
        receiving duties split to different machines), mx should be
        avoided (see section blah blah for reasons to avoid mx in
        complex network topologies).

I have not mentioned this before, because these macros are the "complex 
setups" that make hard things possible. This was one of the fundamental 
reasons for SPF's existance, so I'll let it be.

The DNS limit however acts as an amplifier to the costs of the macros. 
So it should be kept as low as possible.


This is unintuitive logic, but I can see why you are suggesting it --
since the limit amplifies the cost, we should have a lower limit to keep
things from getting seriously out of control.  But this depends on the
exact kind of amplification, geometric, exponential, etc of the
mechanism.  Does this really matter?  No one is suggesting that fast
query count growth mechanisms have different limits than slow growth
ones.  There's still a hard limit for total mechanism evaluation.

Unfortunately, counting TOTAL queries as you have above with your MX
resolution explanation does not make the mx in my SPF record comparable
with the mx in someone else's.  Let's say the limit (for the sake of
this example) is 2 queries (or that the mx appears in the SPF record at
just the right place for the failure to occur).
        
        MX on aol.com returns 4 hosts, which explode into 17 total A
        records, for 21 records total, and 5 queries total (although it
        doesn't, which I'll get to in a minute).

        MX on leave-it-to-grace.com returns one host, which requires an
        A lookup (although it doesn't, which I'll get to in a minute),
        for a total of 2 queries.

In this situation, mail from aol.com which happens to come from just the
right machine at the right time would get PermFail based on the order of
the returned records from the DNS server.  This is a bad thing, because
it is completely unpredictable.  As such, the limits should be high
enough to maintain the utility of having complex setups described by MX
records.

As for the thing I said I'll get to in a minute... looking up MX records
returns the A records of the values of MX in the additional section of
the result, which makes both aol.com and leave-it-to-grace.com MX look
ups only 1 query.  Again, there's nothing to stop a smart DNS server
from populating the additional section of the result of a TXT query for
an SPF record with information to avoid further queries.  If this
existed (and the market might bring this about) the exact cost (in
number of queries) is then different based on the exact version of the
DNS software being run by the SPF serving domain.  Again, a reason to
avoid a total look up count and just compare mechanisms' expenses in
relation to each other.

In light of your explanation of MX, I'm agreeable to making mx weighted
more than a, but not nearly as much as exists should be weighted.  This
would properly encourage people to use mx only when it makes sense.

If the goal is to have everyone publish SPF, we must deal with that 
scenario. How much will it cost our DNS infrastructure if everyone 
published SPF ? Currently only a small percentage do, and looking at the 
traffic numbers I don't like where they are headed.


The only thing I can suggest to combat this is that SPF not be deployed
at all.  Then our DNS numbers would be great, but our stats for forged
email that no one wants would be in the toilet.  I, for one, would like
to throw money at getting more bandwidth and CPU power so I don't have
to deal with forged email (_your_ mileage may vary).  SPF actually has
the capability to do this, in a predictable, scalable way that gets to
the root of the problem (unlike things like baysian filters and MUA
checks (SenderID) that require the mail to be accepted first).

"Boy, this world-wide-web thing is great, but now my DNS servers are
serving X times as many requests than when I just ran a gopher server."

I'm not sure what the problem is with the usage numbers for a service
going up if the service is actually being used.  Is DNS the best way to
distribute this information?  Maybe not, but it gets the job done and is
a well understood technology (at least compared to technologies that
don't exist yet).  Does the utility of SPF outweigh the increased DNS
usage?  Is DNS cheaper than other methods (that were already explored ad
nauseum, like distributing SPF records via HTTP)?  Are there ways people
can make their SPF DNS usage cheaper through SPF record compiling and
optimizations?

I think you ask a great question, but to answer it well it would require 
a lot more research and thought.


Agreed.  Let's keep it going.

-- 
Andy Bakun <spf(_at_)leave-it-to-grace(_dot_)com>