--Hector Santos <winserver(_dot_)support(_at_)winserver(_dot_)com> wrote:
One of the drawbacks with LMAP-based proposals such as DMP and SPF, its
the high overhead potential in DNS lookups.
LMAP proposals have a major initial benefit - validating your own local
domains. This is no doubt, the top #1 benefit gained - protection local
domain spoofing.
However, with 60-80% of the spammers are "spoofers" including using
invalid domains, there is a considerable overhead in failed DNS lookups
for external domains.
I am not sure if anyone answered this part of your message... it looks like
not. Please ignore me if I am repeating something that you already talked
about.
It concerns me that you used the word "failed" to describe DNS lookups in
the case of spoofing, and "overhead" as well. I might be misunderstanding
you, but in case I am not, perhaps this info is helpful.
A name that does not exist should give you a "nonexistent" message
(NXDOMAIN) as quickly (sometimes quicker) than a name that does exist.
NXDOMAIN is a perfectly legitimate answer for a DNS server to give just as
"Unknown user" is a perfectly reasonable response coming from an SMTP
server. This is quite different from a query that has "failed".
This applies equally well to domains that are not registered, as well as to
domains that have no SPF TXT records. Hence:
> dig 12304980.com
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 57543
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
;; AUTHORITY SECTION:
com. 10800 IN SOA a.gtld-servers.net.
nstld.verisign-grs.com. 1077384466 1800 900 604800 900
;; Query time: 132 msec
What is going on here is:
1. Find the NS servers for "." (the root of all DNS) One is:
. 47267 IN NS A.ROOT-SERVERS.NET.
Server probably has this cached, since it is the start of every query in
the world.
2. Find the NS servers for "com." Once of these is
com. 172800 IN NS a.gtld-servers.net.
Server may have this cached, depending on whether you use the "com" TLD
more than once every two days. (That's a joke, son, I say, a joke!)
3. Ask the servers for "com" about "12304980.com" Get NXDOMAIN.
In my case this is done in 132 ms. Most mail servers will do this anyway
to see if the domain is fake. If the domain itself is fake, there is no
reason to proceed with any kind of SPF lookups, they will always be
nonexistent if the answer to the domain itself was NXDOMAIN.
The same should be true for a domain that exists but has no TXT records.
> dig microsoft.com txt
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 28954
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
;; AUTHORITY SECTION:
microsoft.com. 3600 IN SOA dns.cp.msft.net.
msnhst.microsoft.com. 2004022011 300 600 2419200 3600
;; Query time: 25 msec
Similar to the previous example, "." and "com." NS records should be
cached. So step 1 and 2 are the same and nearly free.
3. Ask the servers for "com." for any NS records for "microsoft.com." Get
multiple answers, one is
microsoft.com. 3600 IN NS dns1.dc.msft.net.
4. Ask the servers for "microsoft.com." for TXT records for
"microsoft.com."
In this case the answer is 0 records and the status is "NOERROR". Not
NXDOMAIN, because microsoft.com exists, it just doesn't have any records of
the type we want.
Now, that said, you will sometimes get failures in trying to resolve some
domains, especially if the domain is set up incorrectly or the NS servers
are down, unreachable, or blocking you. This is the case that I would call
"Failure" and it does take considerable time to fail.
Here is one of the failures from my mailserver just now.
reject=451 4.1.8 Domain of sender address college_fm(_at_)printenv(_dot_)com does not
resolve
In this case I get valid answers for the NS records, indicating that it is
registered and has been pointed at two nameservers:
# dig printenv.com ns
printenv.com. 3870 IN NS ns0.arizonnet.com.
printenv.com. 3870 IN NS ns1.arizonnet.com.
but when I try to get answers from either of those servers I get a failure
message:
# dig printenv.com. @ns0.arizonnet.com.
; <<>> DiG 9.2.1 <<>> printenv.com. @ns0.arizonnet.com.
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 34065
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
;; Query time: 74 msec
This one was fast, which means for some reason the nameserver is up and
running but intentionally giving an "error" response for that domain
(perhaps they are suspended for spamming :)
Here is another failure, only much slower...
reject=451 4.1.8 Domain of sender address LNeely(_at_)opendesk(_dot_)com does not
resolve
Trying to look that up I get
# dig opendesk.com
;; connection timed out; no servers could be reached
after 14.07 seconds. There are two NS servers but neither of them seem to
be answering the phone. This can happen if the DNS is set up badly, either
intentionally or due to error... which is why if I bounce messages from
these domains I give them a message that says 4xx - try again when you have
fixed your dns problem.
And, even worse, this "failed" query cannot be cached... (even NXDOMAIN
and NOERROR-but-0-records results can be cached.) The best you can do here
is try to set your timeouts as aggressively as possible in the resolver
setup.
This has been an empirical result of our implementation of LMAP based
solutions which include DMP and SPF for the past 3-4 months in production
operations.
This is why we were forced to provide SMTP system (sysops) the option that
offers "list of LMAP domains" to check. Of course, how this list is
generated is not the point.
The point is simply relying on DNS caching is not sufficient. You may
need to also do your own "intelligent" caching and learning of results.
Probably true... but you should also try a few different DNS servers.
Install your own, if you haven't yet. I have a cacheing DNS server running
on the same system for each high-volume mail server, just as I would with a
web crawler or anything that needs to look up things from the outside world
and not hose my normal nameserver with excessive-query-overload.
If your DNS server is not on your same machine or same physical network,
check to see if there are problems passing UDP packets... sometimes UDP
packets above a certain size may not be routed correctly or something?
I would also recommend that you examine the resolver implementation on the
box itself to see if it is slowing you down (or actually getting things
wrong). I can't think of how, but it's possible.
Just consider that if SPF is going to be widely deployed, there is going
to be alot of network traffic. We need to do our best to minimize this
at by the sysop end and software end.
I don't know if this is as big a concern as you might think. I think that
doing extra lookups will slow things down on the mail server, but I don't
see it adding that much more bandwidth, for example. A UDP query is much
lower overhead than setting up a TCP session (like some "callback" schemes
do) and is probably about the same as checking reverse-DNS (which is
starting to become common practice among large mail servers.) Also if we
detect forgeries early enough in the transaction and we don't have to
receive their DATA, that might save more bandwidth than we are adding.
Good luck!
gregc
--
Greg Connor <gconnor(_at_)nekodojo(_dot_)org>