spf-discuss
[Top] [All Lists]

Need for Complexity in SPF Records

2005-03-27 14:36:51
Radu, I wrote this response yesterday, then today decided it doesn't sound quite right. I'm really not as sure of what I'm saying as it sounds. Show me I'm wrong, and I'll re-double my efforts to find solutions that don't abandon what is already in SPF, solutions like your mask modifier. Examples are the best way to do that. Your example.com below is almost there, but it still doesn't tell me why we really need exists and redirect.

At 07:21 PM 3/26/2005 -0500, Radu wrote:
David MacQuigg wrote:
At 04:06 PM 3/26/2005 -0500, Radu wrote:
David MacQuigg wrote:

Now I'm confused. If the reason for masks is *not* to avoid sending multiple packets, and *only* to avoid processing mechanisms that require another lookup, why do we need these lookups on the client side? Why can't the compiler do whatever lookups the client would do, and make the clients job as simple as possible?

Sorry for creating confusion.

Say that you have a policy that compiles to 1500 bytes.

The compiler will split it into 4 records, about 400-bytes each or so.

example.com     IN TXT \
     "v=spf1 exists:{i}.{d} ip4:... redirect=_s1.{d2} m=-65/8 m=24/8"
_s1.example.com IN TXT "v=spf1 ip4:.... .... ....  redirect=_s2.{d2}"
_s2.example.com IN TXT "v=spf1 ip4:.... .... ....  redirect=_s3.{d2}"
_s3.example.com IN TXT "v=spf1 ip4:.... .... ....  -all"

We want the mask to be applied after the exists:{i}.{d}. Since that
mechanism was in the initial query, cannot be expanded to a list of IPs
the mask cannot possibly apply to it.

I think what you are saying is that the compiler can't get this down to a simple list of IPs, because we need redirects containing macros that depend on information only the client has. So if we are to put the burden of complex SPF evaluations on the server side, where it belongs, seems like we have to pass all the necessary information to the server in the initial query. We already pass the domain name. Adding the IP address should not be a big burden, and it would have some other benefits we discussed.

If you can find a way to do that and still keep the query cacheable, let me know. If it is compatible with the way DNS works currently, I'll even listen and pay attention. ;)

That 1 UDP packet might not seem like a lot. But currently it is cacheable and most of the time is not even seen on the internet. Making it uncacheable would be a multiple fold burden on bandwidth. That's exactly why caching and the TTL mechanism was invented, and now you suggest we give it up?

No, I see your point. If we truly need %{i} macros, and we evaluate them on the server side, that would produce a different response record for every IP address, and it might not make sense to cache such records. Responses for SPF records with no %{i} macros would cache as always. The %{d} macros would not impair caching. Even the %{i} responses might be worth caching for a few minutes, if you are getting hammered by one IP.

Whether the loss of caching on a few records is too high a price depends on the severity of the threatened abuse. Should we tolerate a small increase in DNS load for the normal flow of email, to limit the worst-case abuse of the %{i} macro. I don't know.

What I *would* do is discourage the widespread use of macros, redirects, and includes, and state in the standard that processing of records with these features SHOULD be lower priority than processing simple records. That may help to implement a defense mode if these features are abused.

Maybe I'm just not seeing the necessity of setups like the above example.com. I'm sure someone could come up with a scenario where it would be real nice if all SPF checkers could run a Perl script embedded in an SPF record, but we have to ask, is that really necessary to verify a domain name?

The "..." imply a list of ip4: mechanism that is 400-bytes long. That's why the chaining is necessary. ebay.com has something like that. hotmail.com uses something similar too. When you have lots of outgoing servers, you need more space to list them, no?

Why can't they make each group of servers a sub-domain with its own simple DNS records, as rr.com has done with its subdomains? _s3.example.com can have as many servers as can be listed in a 400 byte SPF record, and that includes some racks with hundreds of servers listed in one 20 byte piece of the 400 byte record. With normal clustering of addresses, I would think you could list thousands of servers in each subdomain, with nothing but ip4's in the SPF record.

As I understand it, users sending mail from _s3.example.com will still see 'example.com' in their headers, but the envelope address will be the real one _s3.example.com. That's the one that needs to authenticate, and the one that will inherit its reputation from example.com.

Seems to me this is using DNS exactly the way it was intended, distributing the data out to the lowest levels, and avoiding the need to construct hierarchies within the SPF records. Sure, it can be done, but what is the advantage over just putting simple records at the lowest levels, and letting DNS take care of the hierarchy? Why does ebay.com need four levels of hierarchy in its SPF records?

If we simply can't sell SPF without all these whiz-bang features, I would say put it *all* on the server side. All the client should have to do is ask - "Hey <domain> is this <ip> OK?" We dropped that idea because it doesn't allow caching on the client side, but with a simple PASS/FAIL response, the cost of no caching is only one UDP round trip per email. This seems like small change compared to worries about runaway redirects, malicious macros, etc.

I'll humour you:

This server-side processing would not be happening on a caching server, correct? That would not save anything. I hope you agree.

If the caching server were in the domain which created the expensive SPF record, then it would save traffic to and from the client, at the expense of traffic within the domain that deserves it. If example.com needs 100 queries within their network to answer my simple query "Is this <ip> OK?", then they need to think about how to better organize their records. All I need is a simple PASS/FAIL, or preferably a list of IP blocks that I can cache to avoid future queries. ( This should be the server's choice.)

What I *don't* want in answer to my simple query, is a complex script to run every time I have a similar query. That seems to be the fundamental source of our problem. SPF needs to focus on its core mission, authenticating domain names, and doing just that as efficiently and securely as possible. All these complex features seem to be motivated by a desire to provide functionality beyond the core mission - constructing DNS block lists, etc. Now we are finding that the complex features are not only slowing us down, but have opened up some unnecessary vulnerabilities to abuse.

So the only place where it might make a difference is if the evaluation was run on the authoritative server for the domain.

The problem with that, is that authoritative servers are designed with performance and reliability in mind (as opposed to caching servers, which care more about cost savings). As such, the auth servers *do not* do recursive queries, as an SPF record evaluation might be. They also do not do any caching. They respond to every query with a piece of data they already have in memory or on disk. If they don't have that piece of information, they return an empty response or "it doesn't exist" (NXDOMAIN). They never look for it anywhere else. That's why they are authoritative. If they don't know about it, it doesn't exist.

Now, the spfcompiler only makes sense if it is running on a master server. Itself the master for a zone is authoritative. The above authoritative servers are slaves. They take the information from the master server and disseminate it as if it was their own. It is the adminstrator of the master zone server that allows them to do so. No other server can respond authoritatively to queries for the zone in question.

So, the only place the spf compiler makes sense is on the master server, because ultimately, it is the only one who really knows the facts. When the facts change, the master informs the slaves, which do a zone transfer in order to update their databases. So the truth propagates nearly instantly from the master to the slaves, and as such the slaves can be seen as clones of the master, identical in every way, except for the 3 second delay it takes them to transfer the zone files.

You cannot run the compiler on the slaves, because they might each evaluate the record differently, as they are coping with different network conditions (such as lost packets, etc). In that case, they would each tell a different "truth" than each other and than the master server. In that case they would no longer be authoritative.

Now, having the master zone server respond to queries that require it to do calculations of any kind is an absolute no-no. That is because no matter how big the zone is (yahoo, rr, whatever), there is only one master. Ok, there may be a couple, but their job is not to respond to queries, but to 'hold the authority'. The slaves are for responding to queries.

I would also say the slaves are the right machines on which to do whatever complex lookups are needed to answer a query. The owners of those machines are the only ones who will make the tradeoff of cost vs desired complexity.

So doing what you propose would require the DNS system to be turned upside down. The justification of SPF is just not good enough.

I don't see how this turns anything upside down. DNS is supposed to be decentralized. If complex lookups are necessary, having a bunch of slave servers do the work on behalf of a master server is consistent with decentralization.

Let's estimate the worst-case load on DNS if we say "no lookups, one packet only in any response". I'm guessing 90% of domains will provide a simple, compiled, cachable, list of IP blocks. This is as good as it gets, with the possible exception of a fallback to TCP if the packet is too long. The 10% with really complex policies may have a big burden from queries and computations within their own network, but what goes across the Internet is a simple UDP packet with a PASS or FAIL.

That response is not cacheable, but lets compare the added load to some other things that happen with each email. Setting up a TCP connection is a minimum of three packets. SMTP takes two packets for the HELO and response. MAIL FROM is another two. Then we need two for the authentication. At that point we can send a reject (one packet) and terminate the connection (4 packets).

Looks to me like the additional load on DNS is insignificant for normal mail, and only a few percent of the minimum traffic per email in a DoS storm. Also, the additional load is primarily on the domain with the expensive SPF records, where it should be. Even if this were a spammer domain, and they weren't *really* doing any internal lookups, the load on their DNS server is two packets for every additional two-packet load on the vitims. No amplification factor here.

How about this: All SPF records SHOULD be compiled down to a list of IPs. If you need more than that, then do as much as you like, but give the client a simple PASS or FAIL. Most domains will then say "Here is our list of IPs. Don't ask again for X hours." Only a few will say "Our policy is so complex, you can't possibly understand it. Send us every IP you want checked."

That's exactly what the exists:{i}.domain does. It tells the domain about every IP it wants checked, and the server checks it. Unfortunately, it is extremely expensive because it's AGAU.

If I were writing an SPF-doom virus, this is where I would start.

I need to get back to designing ICs. :>)

Nah... you've got some great ideas and I value your contribution and feedback.

And I appreciate your time in getting me up to speed on these problems. I hope one day I can return the favor.

-- Dave
************************************************************     *
* David MacQuigg, PhD      email:  dmquigg-spf at yahoo.com      *  *
* IC Design Engineer            phone:  USA 520-721-4583      *  *  *
* Analog Design Methodologies                                 *  *  *
*                                   9320 East Mikelyn Lane     * * *
* VRS Consulting, P.C.              Tucson, Arizona 85710        *
************************************************************ *


<Prev in Thread] Current Thread [Next in Thread>