spf-discuss
[Top] [All Lists]

Re: DNS lookup limits

2005-03-25 17:10:36
Andy Bakun wrote:
In <42447730(_dot_)3010200(_at_)ohmi(_dot_)org>, Radu Hociung wrote:

I'm proposing we count calls the the resolver library. Anything else is guess work.


In <4244694F(_dot_)3000608(_at_)ohmi(_dot_)org>, Radu Hociung wrote:

Let's call it unchallenged, not correct. It's completely up to the DNS
server implementation whether it sends information it wasn't asked for, but suspects would be useful. bind9 does send out as much info as possible, but apparently aol's NS servers do not send the additional info (do nslookup -debug -type=mx aol.com dns-01.ns.aol.com).

Besides, if an MX contains a list of A records of many long-host-names-as-MTA-servers.com, there may not be enough room in one UDP packet the IP addreses of those hosts, and maybe not even enough room for all the names.

The DNS server truncates the name list, and then round-robin rotates them, to give the truncated out ones an equal opportunity to serve mail requests. In this case, there would be no additional records, and each subsequent A query will generate traffic.


This calls into question the usefulness of your calculations that showed
gigabytes transfered because of a high number of queries performed, er,
"calls to the resolver library".

It is not a one-to-one mapping of "calls to the resolver library" to
"bytes that traverse the public interface", because of the design of the
DNS.

When you call the resolver about a domain you haven't seen before (or recently), that resolver call will most certainly map to a packet across the net. If that domain has a complex SPF, that may be 20 packets across the net.


If you see the same domain very frequently, the number of resolver calls that end up on the net is N * (P / max(TTL, 1/F))

The number of calls to the resolver is N * P * F

Where N is the number of query calls required by the SPF record.
P is the observation period, and it is much higher than TTL for meaningful result.
The TTL is assumed the same for all DNS mechs, for simplicity.
F is the frequency with which you see email from the same domain.

So, for a TTL of 1000, SPF record with 5 resolver calls, and a frequency of 0.1 (1 mail every 10 seconds), and observation period of 100,000 you call the resolver 50,000 and 500 of those calls end up on the net.

The number of DNS mechanisms linearly affects the traffic generated.
The TTL of records affect the traffic inversely proportional.

You can see that for infrequently seen domains (such as vanity domains with 1 or two users), ie F < 1/TTL the number of packets sent to the internet is a linear function of N.

The fact is whichever way to calculate it, the number of packets on the net is proportional to the number of calls to the resolver function, if the all other variables are kept the same. In turn, the number of resolver calls is proportional to the number of DNS mechanism, if the characteristics/features of the cache/DNS infrastructure remain the same.

Ie, if the resolver always returns the A records with the MX record, a record with 2 MX calls is twice as expensive as one with 1 MX call.

The DNS features may help alleviate some of the expense, but that is not a guarantee for all MTAs. Just as well, an server built-in compiler will help lower the load, but it's not a guarantee that an MTA will only request SPF records from servers with compilers built-in.

I'm still trying to figure out if we are concentrating on optimizing the
typical case (normal mail volume) or the atypical case (SPF-doom
attack).

We're working on optimizing the worst case normal case, while minimizing the incentive and damage caused by the SPF-doom attack.

If we let the DNS mechanism limit be 111, there would be no need for the spfcompiler, but the temptation to write the virus would be off the charts.

On the other hand, if we set the DNS limit to 1, there would be no temptation to write the virus, but the worst case (complicated network) would not be compatible.

We're looking for the middle ground.

If we had the spfcompiler built in, the middle ground could be a very low DNS mechanism limit.


One thing that makes the atypical case uninteresting for me is that it
exists ONLY because SMTP lets forgery happen.  There will be a
transitioning period where a virus attack that uses the forgery vector
to propagate will be a attempted and it may be a big hit on DNS, but
because the hole has been closed, it won't work or at least won't be as
serious as it would have been with the hole still open (that is, there
will be new problems to solve, rather than revisiting the same ol'
problems again and again).  Since the vector of attack is now closed, it
will be useless to attempt to exploit it.

I can see your point, but forgery-free-day has not been scheduled yet ;)

This does not mean that new attack vectors won't be discovered -- such
as an attack against SPF (perhaps indirectly).  If that happens,
hopefully the value of anti-forgery will have already been seen, and if
it is difficult, if not impossible, to close that new attack vector,
then SPF will be replaced with some other anti-forgery method.  We have
not gone back to gopher just because web pages have increased our
bandwidth costs and required our servers to be beefier.

Yes, but we did make often used HTML cheap (A anchors, <B>, <U>, etc.) and the more exotic stuff expensive (<SCRIPT> <TABLE>). Anyway, the graphical display is seen as value worth having, but while SPF in itself adds some value, allowing more DNS expense than necessary does not add any incremental value).

I am perfectly fine with a solution like RMX because I find much of the
SPF syntax to be sugary.

The syntactic sugar makes the SPF publisher's job simple at the expense
of SPF evaluators being more complex.  All these dire predictions of
SPF's failure make it seem like weaknesses were purposely built into the
system, and now we're running out of fingers to stick in the dam.  I am
not convinced that anything useful can be done to tip the scales in the
other direction (make the evaluation simpler) without losing the
syntactic sugar that helped put SPF ahead of other the proposals that
were/are on the table.  How much of SPF's success-so-far is because
anyone can add their records with less than 15 minutes worth of work (if
those records are correct or not is another issue) -- it's this
simplicity that has gotten SPF the mindshare it has.

Perhaps hobbyists see it as a 15-minute solution, but companies pay fortunes for spam detection software and IT staff to maintain it, and they are more willing to spend more time in return for a guarantee of authentic email (and the victims of phishing are the first who look at it more seriously than 15 minutes).

SPF is more valuable to some than others, and I think there are plenty who would spend more than 15 minutes if SPF were more robust, efficient and less-prone to the next big DDOS.


I think there's a lot of concentration on making everyone happy, and not
enough concentration on the actual problem of reducing forgery.

I think we are all agreeing that SPF is at least partially the solution to forgery, so we're concentrating on making it work in the most reliable way possible.


Unfortunately, we're here now, so we kind of have to live with what's
been provided. SPF is really starting to look like a dog's breakfast. It does everything if you use it, and yet if you use it, it does
nothing, or sometimes, even worse than nothing.

I agree it started to look like dog's breakfast when all the Microsoft BS politics and fear-mungering was going on. I'm not happy to see that this elected "Council" appears so uninterested in SPF, but perhaps my perception is wrong.

I think that while we made some good progress, and uncovered some weaknesses, and found some solutions, the most remarkable thing is the level of involvement and apparent desire to make SPF work.

If we didn't believe in SPF, we wouldn't be here, still talking.

So cheer up! We will fix this email problem, and I believe SPF will be part of the fix.

Radu.