Re: DNS lookup limits

Andy Bakun wrote:

In <42447730(_dot_)3010200(_at_)ohmi(_dot_)org>, Radu Hociung wrote:
I'm proposing we count calls the the resolver library. Anything else isguess work.
In <4244694F(_dot_)3000608(_at_)ohmi(_dot_)org>, Radu Hociung wrote:
Let's call it unchallenged, not correct. It's completely up to the DNS
server implementation whether it sends information it wasn't asked for,but suspects would be useful. bind9 does send out as much info aspossible, but apparently aol's NS servers do not send the additionalinfo (do nslookup -debug -type=mx aol.com dns-01.ns.aol.com).
Besides, if an MX contains a list of A records of manylong-host-names-as-MTA-servers.com, there may not be enough room in oneUDP packet the IP addreses of those hosts, and maybe not even enoughroom for all the names.
The DNS server truncates the name list, and then round-robin rotatesthem, to give the truncated out ones an equal opportunity to serve mailrequests. In this case, there would be no additional records, and eachsubsequent A query will generate traffic.
This calls into question the usefulness of your calculations that showed
gigabytes transfered because of a high number of queries performed, er,
"calls to the resolver library".

It is not a one-to-one mapping of "calls to the resolver library" to
"bytes that traverse the public interface", because of the design of the
DNS.

When you call the resolver about a domain you haven't seen before (orrecently), that resolver call will most certainly map to a packet acrossthe net. If that domain has a complex SPF, that may be 20 packets acrossthe net.

If you see the same domain very frequently, the number of resolver callsthat end up on the net is N * (P / max(TTL, 1/F))


The number of calls to the resolver is N * P * F

Where N is the number of query calls required by the SPF record.

P is the observation period, and it is much higher than TTL formeaningful result.

The TTL is assumed the same for all DNS mechs, for simplicity.
F is the frequency with which you see email from the same domain.

So, for a TTL of 1000, SPF record with 5 resolver calls, and a frequencyof 0.1 (1 mail every 10 seconds), and observation period of 100,000 youcall the resolver 50,000 and 500 of those calls end up on the net.


The number of DNS mechanisms linearly affects the traffic generated.
The TTL of records affect the traffic inversely proportional.

You can see that for infrequently seen domains (such as vanity domainswith 1 or two users), ie F < 1/TTL the number of packets sent to theinternet is a linear function of N.

The fact is whichever way to calculate it, the number of packets on thenet is proportional to the number of calls to the resolver function, ifthe all other variables are kept the same. In turn, the number ofresolver calls is proportional to the number of DNS mechanism, if thecharacteristics/features of the cache/DNS infrastructure remain the same.

Ie, if the resolver always returns the A records with the MX record, arecord with 2 MX calls is twice as expensive as one with 1 MX call.

The DNS features may help alleviate some of the expense, but that is nota guarantee for all MTAs. Just as well, an server built-in compiler willhelp lower the load, but it's not a guarantee that an MTA will onlyrequest SPF records from servers with compilers built-in.

I'm still trying to figure out if we are concentrating on optimizing the
typical case (normal mail volume) or the atypical case (SPF-doom
attack).

We're working on optimizing the worst case normal case, while minimizingthe incentive and damage caused by the SPF-doom attack.

If we let the DNS mechanism limit be 111, there would be no need for thespfcompiler, but the temptation to write the virus would be off the charts.

On the other hand, if we set the DNS limit to 1, there would be notemptation to write the virus, but the worst case (complicated network)would not be compatible.


We're looking for the middle ground.

If we had the spfcompiler built in, the middle ground could be a verylow DNS mechanism limit.

One thing that makes the atypical case uninteresting for me is that it
exists ONLY because SMTP lets forgery happen.  There will be a
transitioning period where a virus attack that uses the forgery vector
to propagate will be a attempted and it may be a big hit on DNS, but
because the hole has been closed, it won't work or at least won't be as
serious as it would have been with the hole still open (that is, there
will be new problems to solve, rather than revisiting the same ol'
problems again and again).  Since the vector of attack is now closed, it
will be useless to attempt to exploit it.


I can see your point, but forgery-free-day has not been scheduled yet ;)

This does not mean that new attack vectors won't be discovered -- such
as an attack against SPF (perhaps indirectly).  If that happens,
hopefully the value of anti-forgery will have already been seen, and if
it is difficult, if not impossible, to close that new attack vector,
then SPF will be replaced with some other anti-forgery method.  We have
not gone back to gopher just because web pages have increased our
bandwidth costs and required our servers to be beefier.

Yes, but we did make often used HTML cheap (A anchors, <B>, <U>, etc.)and the more exotic stuff expensive (<SCRIPT> <TABLE>). Anyway, thegraphical display is seen as value worth having, but while SPF in itselfadds some value, allowing more DNS expense than necessary does not addany incremental value).

I am perfectly fine with a solution like RMX because I find much of the
SPF syntax to be sugary.

The syntactic sugar makes the SPF publisher's job simple at the expense
of SPF evaluators being more complex.  All these dire predictions of
SPF's failure make it seem like weaknesses were purposely built into the
system, and now we're running out of fingers to stick in the dam.  I am
not convinced that anything useful can be done to tip the scales in the
other direction (make the evaluation simpler) without losing the
syntactic sugar that helped put SPF ahead of other the proposals that
were/are on the table.  How much of SPF's success-so-far is because
anyone can add their records with less than 15 minutes worth of work (if
those records are correct or not is another issue) -- it's this
simplicity that has gotten SPF the mindshare it has.

Perhaps hobbyists see it as a 15-minute solution, but companies payfortunes for spam detection software and IT staff to maintain it, andthey are more willing to spend more time in return for a guarantee ofauthentic email (and the victims of phishing are the first who look atit more seriously than 15 minutes).

SPF is more valuable to some than others, and I think there are plentywho would spend more than 15 minutes if SPF were more robust, efficientand less-prone to the next big DDOS.

I think there's a lot of concentration on making everyone happy, and not
enough concentration on the actual problem of reducing forgery.

I think we are all agreeing that SPF is at least partially the solutionto forgery, so we're concentrating on making it work in the mostreliable way possible.

Unfortunately, we're here now, so we kind of have to live with what's
been provided. SPF is really starting to look like a dog's breakfast.It does everything if you use it, and yet if you use it, it does
nothing, or sometimes, even worse than nothing.

I agree it started to look like dog's breakfast when all the MicrosoftBS politics and fear-mungering was going on. I'm not happy to see thatthis elected "Council" appears so uninterested in SPF, but perhaps myperception is wrong.

I think that while we made some good progress, and uncovered someweaknesses, and found some solutions, the most remarkable thing is thelevel of involvement and apparent desire to make SPF work.


If we didn't believe in SPF, we wouldn't be here, still talking.

So cheer up! We will fix this email problem, and I believe SPF will bepart of the fix.


Radu.