spf-discuss
[Top] [All Lists]

Re: Performance issues

2004-02-22 02:12:02
--Hector Santos <winserver(_dot_)support(_at_)winserver(_dot_)com> wrote:

One of the drawbacks with LMAP-based proposals such as DMP and SPF, its
the high overhead potential in DNS lookups.

LMAP proposals have a major initial benefit - validating your own local
domains.  This is no doubt, the top #1 benefit gained - protection local
domain spoofing.

However, with 60-80% of the spammers are "spoofers" including using
invalid domains, there is a considerable overhead in failed DNS lookups
for external domains.


I am not sure if anyone answered this part of your message... it looks like not. Please ignore me if I am repeating something that you already talked about.

It concerns me that you used the word "failed" to describe DNS lookups in the case of spoofing, and "overhead" as well. I might be misunderstanding you, but in case I am not, perhaps this info is helpful.

A name that does not exist should give you a "nonexistent" message (NXDOMAIN) as quickly (sometimes quicker) than a name that does exist. NXDOMAIN is a perfectly legitimate answer for a DNS server to give just as "Unknown user" is a perfectly reasonable response coming from an SMTP server. This is quite different from a query that has "failed".

This applies equally well to domains that are not registered, as well as to domains that have no SPF TXT records. Hence:

 > dig 12304980.com
 ;; Got answer:
 ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 57543
 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
 ;; AUTHORITY SECTION:
com. 10800 IN SOA a.gtld-servers.net. nstld.verisign-grs.com. 1077384466 1800 900 604800 900
 ;; Query time: 132 msec

What is going on here is:
1. Find the NS servers for "." (the root of all DNS)  One is:
 .      47267   IN      NS      A.ROOT-SERVERS.NET.
Server probably has this cached, since it is the start of every query in the world.

2. Find the NS servers for "com." Once of these is
 com.   172800  IN      NS      a.gtld-servers.net.
Server may have this cached, depending on whether you use the "com" TLD more than once every two days. (That's a joke, son, I say, a joke!)

3. Ask the servers for "com" about "12304980.com"  Get NXDOMAIN.

In my case this is done in 132 ms. Most mail servers will do this anyway to see if the domain is fake. If the domain itself is fake, there is no reason to proceed with any kind of SPF lookups, they will always be nonexistent if the answer to the domain itself was NXDOMAIN.


The same should be true for a domain that exists but has no TXT records.

 > dig microsoft.com txt
 ;; Got answer:
 ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28954
 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
 ;; AUTHORITY SECTION:
microsoft.com. 3600 IN SOA dns.cp.msft.net. msnhst.microsoft.com. 2004022011 300 600 2419200 3600
 ;; Query time: 25 msec

Similar to the previous example, "." and "com." NS records should be cached. So step 1 and 2 are the same and nearly free.

3. Ask the servers for "com." for any NS records for "microsoft.com." Get multiple answers, one is
 microsoft.com.   3600    IN      NS      dns1.dc.msft.net.

4. Ask the servers for "microsoft.com." for TXT records for "microsoft.com." In this case the answer is 0 records and the status is "NOERROR". Not NXDOMAIN, because microsoft.com exists, it just doesn't have any records of the type we want.


Now, that said, you will sometimes get failures in trying to resolve some domains, especially if the domain is set up incorrectly or the NS servers are down, unreachable, or blocking you. This is the case that I would call "Failure" and it does take considerable time to fail.

Here is one of the failures from my mailserver just now.
reject=451 4.1.8 Domain of sender address college_fm(_at_)printenv(_dot_)com does not resolve

In this case I get valid answers for the NS records, indicating that it is registered and has been pointed at two nameservers:
 # dig printenv.com ns
 printenv.com.           3870    IN      NS      ns0.arizonnet.com.
 printenv.com.           3870    IN      NS      ns1.arizonnet.com.
but when I try to get answers from either of those servers I get a failure message:
 # dig printenv.com. @ns0.arizonnet.com.

 ; <<>> DiG 9.2.1 <<>> printenv.com. @ns0.arizonnet.com.
 ;; global options:  printcmd
 ;; Got answer:
 ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 34065
 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
 ;; Query time: 74 msec

This one was fast, which means for some reason the nameserver is up and running but intentionally giving an "error" response for that domain (perhaps they are suspended for spamming :)

Here is another failure, only much slower...

reject=451 4.1.8 Domain of sender address LNeely(_at_)opendesk(_dot_)com does not resolve

Trying to look that up I get

# dig opendesk.com
;; connection timed out; no servers could be reached

after 14.07 seconds. There are two NS servers but neither of them seem to be answering the phone. This can happen if the DNS is set up badly, either intentionally or due to error... which is why if I bounce messages from these domains I give them a message that says 4xx - try again when you have fixed your dns problem.

And, even worse, this "failed" query cannot be cached... (even NXDOMAIN and NOERROR-but-0-records results can be cached.) The best you can do here is try to set your timeouts as aggressively as possible in the resolver setup.



This has been an empirical result of our implementation of LMAP based
solutions which include DMP and SPF for the past 3-4 months in production
operations.

This is why we were forced to provide SMTP system (sysops) the option that
offers "list of LMAP domains" to check.   Of course, how this list is
generated is not the point.

The point is simply relying on DNS caching is not sufficient.  You may
need to also do your own "intelligent" caching and learning of results.


Probably true... but you should also try a few different DNS servers. Install your own, if you haven't yet. I have a cacheing DNS server running on the same system for each high-volume mail server, just as I would with a web crawler or anything that needs to look up things from the outside world and not hose my normal nameserver with excessive-query-overload.

If your DNS server is not on your same machine or same physical network, check to see if there are problems passing UDP packets... sometimes UDP packets above a certain size may not be routed correctly or something?

I would also recommend that you examine the resolver implementation on the box itself to see if it is slowing you down (or actually getting things wrong). I can't think of how, but it's possible.


Just consider that if SPF is going to be widely deployed, there is going
to be alot of network traffic.  We need to do our best to minimize this
at by the sysop end and software end.

I don't know if this is as big a concern as you might think. I think that doing extra lookups will slow things down on the mail server, but I don't see it adding that much more bandwidth, for example. A UDP query is much lower overhead than setting up a TCP session (like some "callback" schemes do) and is probably about the same as checking reverse-DNS (which is starting to become common practice among large mail servers.) Also if we detect forgeries early enough in the transaction and we don't have to receive their DATA, that might save more bandwidth than we are adding.

Good luck!
gregc
--
Greg Connor <gconnor(_at_)nekodojo(_dot_)org>


<Prev in Thread] Current Thread [Next in Thread>