Re: Performance issues

--Hector Santos <winserver(_dot_)support(_at_)winserver(_dot_)com> wrote:

One of the drawbacks with LMAP-based proposals such as DMP and SPF, its
the high overhead potential in DNS lookups.

LMAP proposals have a major initial benefit - validating your own local
domains.  This is no doubt, the top #1 benefit gained - protection local
domain spoofing.

However, with 60-80% of the spammers are "spoofers" including using
invalid domains, there is a considerable overhead in failed DNS lookups
for external domains.

I am not sure if anyone answered this part of your message... it looks likenot. Please ignore me if I am repeating something that you already talkedabout.

It concerns me that you used the word "failed" to describe DNS lookups inthe case of spoofing, and "overhead" as well. I might be misunderstandingyou, but in case I am not, perhaps this info is helpful.

A name that does not exist should give you a "nonexistent" message(NXDOMAIN) as quickly (sometimes quicker) than a name that does exist.NXDOMAIN is a perfectly legitimate answer for a DNS server to give just as"Unknown user" is a perfectly reasonable response coming from an SMTPserver. This is quite different from a query that has "failed".

This applies equally well to domains that are not registered, as well as todomains that have no SPF TXT records. Hence:


 > dig 12304980.com
 ;; Got answer:
 ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 57543
 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
 ;; AUTHORITY SECTION:

com. 10800 IN SOA a.gtld-servers.net.nstld.verisign-grs.com. 1077384466 1800 900 604800 900

 ;; Query time: 132 msec

What is going on here is:
1. Find the NS servers for "." (the root of all DNS)  One is:
 .      47267   IN      NS      A.ROOT-SERVERS.NET.

Server probably has this cached, since it is the start of every query inthe world.


2. Find the NS servers for "com." Once of these is
 com.   172800  IN      NS      a.gtld-servers.net.

Server may have this cached, depending on whether you use the "com" TLDmore than once every two days. (That's a joke, son, I say, a joke!)


3. Ask the servers for "com" about "12304980.com"  Get NXDOMAIN.

In my case this is done in 132 ms. Most mail servers will do this anywayto see if the domain is fake. If the domain itself is fake, there is noreason to proceed with any kind of SPF lookups, they will always benonexistent if the answer to the domain itself was NXDOMAIN.



The same should be true for a domain that exists but has no TXT records.

 > dig microsoft.com txt
 ;; Got answer:
 ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28954
 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
 ;; AUTHORITY SECTION:

microsoft.com. 3600 IN SOA dns.cp.msft.net.msnhst.microsoft.com. 2004022011 300 600 2419200 3600

 ;; Query time: 25 msec

Similar to the previous example, "." and "com." NS records should becached. So step 1 and 2 are the same and nearly free.

3. Ask the servers for "com." for any NS records for "microsoft.com." Getmultiple answers, one is

 microsoft.com.   3600    IN      NS      dns1.dc.msft.net.

4. Ask the servers for "microsoft.com." for TXT records for"microsoft.com."In this case the answer is 0 records and the status is "NOERROR". NotNXDOMAIN, because microsoft.com exists, it just doesn't have any records ofthe type we want.

Now, that said, you will sometimes get failures in trying to resolve somedomains, especially if the domain is set up incorrectly or the NS serversare down, unreachable, or blocking you. This is the case that I would call"Failure" and it does take considerable time to fail.


Here is one of the failures from my mailserver just now.

reject=451 4.1.8 Domain of sender address college_fm(_at_)printenv(_dot_)com does notresolve

In this case I get valid answers for the NS records, indicating that it isregistered and has been pointed at two nameservers:

 # dig printenv.com ns
 printenv.com.           3870    IN      NS      ns0.arizonnet.com.
 printenv.com.           3870    IN      NS      ns1.arizonnet.com.

but when I try to get answers from either of those servers I get a failuremessage:

 # dig printenv.com. @ns0.arizonnet.com.

 ; <<>> DiG 9.2.1 <<>> printenv.com. @ns0.arizonnet.com.
 ;; global options:  printcmd
 ;; Got answer:
 ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 34065
 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
 ;; Query time: 74 msec

This one was fast, which means for some reason the nameserver is up andrunning but intentionally giving an "error" response for that domain(perhaps they are suspended for spamming :)


Here is another failure, only much slower...

reject=451 4.1.8 Domain of sender address LNeely(_at_)opendesk(_dot_)com does notresolve


Trying to look that up I get

# dig opendesk.com
;; connection timed out; no servers could be reached

after 14.07 seconds. There are two NS servers but neither of them seem tobe answering the phone. This can happen if the DNS is set up badly, eitherintentionally or due to error... which is why if I bounce messages fromthese domains I give them a message that says 4xx - try again when you havefixed your dns problem.

And, even worse, this "failed" query cannot be cached... (even NXDOMAINand NOERROR-but-0-records results can be cached.) The best you can do hereis try to set your timeouts as aggressively as possible in the resolversetup.

This has been an empirical result of our implementation of LMAP based
solutions which include DMP and SPF for the past 3-4 months in production
operations.

This is why we were forced to provide SMTP system (sysops) the option that
offers "list of LMAP domains" to check.   Of course, how this list is
generated is not the point.

The point is simply relying on DNS caching is not sufficient.  You may
need to also do your own "intelligent" caching and learning of results.

Probably true... but you should also try a few different DNS servers.Install your own, if you haven't yet. I have a cacheing DNS server runningon the same system for each high-volume mail server, just as I would with aweb crawler or anything that needs to look up things from the outside worldand not hose my normal nameserver with excessive-query-overload.

If your DNS server is not on your same machine or same physical network,check to see if there are problems passing UDP packets... sometimes UDPpackets above a certain size may not be routed correctly or something?

I would also recommend that you examine the resolver implementation on thebox itself to see if it is slowing you down (or actually getting thingswrong). I can't think of how, but it's possible.

Just consider that if SPF is going to be widely deployed, there is going
to be alot of network traffic.  We need to do our best to minimize this
at by the sysop end and software end.

I don't know if this is as big a concern as you might think. I think thatdoing extra lookups will slow things down on the mail server, but I don'tsee it adding that much more bandwidth, for example. A UDP query is muchlower overhead than setting up a TCP session (like some "callback" schemesdo) and is probably about the same as checking reverse-DNS (which isstarting to become common practice among large mail servers.) Also if wedetect forgeries early enough in the transaction and we don't have toreceive their DATA, that might save more bandwidth than we are adding.


Good luck!
gregc
--
Greg Connor <gconnor(_at_)nekodojo(_dot_)org>