Re: DNS Loading Comparison

On Tue, 12 Apr 2005, at 19:36, David MacQuigg wrote:

It looks like SPF has the potential of being 1000X more efficient than CSV
in both number of DNS queries and cache sizes.  I tried to bring this up on
the CSV mailing list, but they don't want to hear it.  So I bring it up
here, in hopes of getting some check on my assumptions.

Assume:
       2,000 zombies, widely distributed
      50,000 emails from each zombie
100,000,000 recipient addresses, widely distributed
     100,000 recipient domains
           3 hops from sender to receiver
Then:
   2000 senders --> 3 hops -->  100,000 receivers
      approx. 150,000 MTAs needing to authenticate

Scenario E1:  All DNS queries to rr.com
     Total 150,000 queries, cached for 48 hours

Scenario E2:  DNS queries to 1000 servers, widely distributed
    Typical server:  serv138.austin.rr.com
    150,000 MTAs x 1000 servers = 150,000,000 queries !!
    Client caches are 1000X larger, and 1000X less
      likely to hit.


Since you're using rr.com, I'll respond.

I'm not sure I follow your math here.  While I disagree with your
assumption that there would be 100,000 recipient domains as the
target of these widespread zombies (I think the number would be
much, much smaller, because there's more bang for the buck in
trying to hit 1,000,000 mailboxes at each of 100 large domains
rather than 1,000 mailboxes at each of 100,000 random domains),
I'll go with that bit for now.

First:

Then:
   2000 senders --> 3 hops -->  100,000 receivers
      approx. 150,000 MTAs needing to authenticate


I think this assumption to be wrong; I think the reality would
be:

   2000 senders -->  100,000 receivers
     at most 100,000 MTAs needing to check SPF records


Zombies are more likely to attempt direct connections to the
target domains, rather than routing their traffic through the
local SMTP servers.  Were they to all be Road Runner customers,
and were they all to route through our SMTP servers, they'd be
limited to a total of 2,000,000 recipients, as we limit each
residential IP address to sending to 1,000 recipients per day.
(Our outbound servers don't do SPF checking, and they'd only
represent one hop, not three, in this chain, anyway.)

So, we presume that all 100,000 MTAs do SPF checking, and that the
email connections would not be rejected out of hand for reasons
other than SPF records, before the SPF check was even necessary.

For instance, all Road Runner residential customers now have IP
addresses whose PTR record ends in 'res.rr.com'.  This PTR record
indicates that this is residential dynamic space, and while our
AUP doesn't forbid our customers from running servers in our
dynamic space (cable modems provide a persistence of IP address
assignment due to their always-on nature that can last for
months, meaning that 'dynamic' is a relative term) there are
sites that will reject any connection from our residential space.
This rejection would take place before the "MAIL FROM" part of
the SMTP transaction, thus eliminating the need for SPF checking.
Moreover, zombie infected computers tend to end up on public and
private block lists, so even if the target sites don't reject the
connection based solely on its being from dynamic space, we must
presume that some number of these connections would be rejected,
again before the "MAIL FROM" part of the transaction, due to
their being present on one or more block lists.

But let's presume that all 100,000 domains aren't rejecting from
dynamic space, and that they're not using any block lists that
list any Road Runner IPs with zombies.  (We're going to assume
that the three dozen or so Road Runner-managed SMTP servers will
not be infected by zombies, since they don't run the OS that
tends to be infected with zombies.)

Now, there are roughly 75 legitimate DNS domains ending in
rr.com, including the domain rr.com.

In your scenario E1 above, I'm presuming that your assumption is
that each of the 50,000 emails from each zombie would purport to
be from "user(_at_)rr(_dot_)com".  Each of the 100,000 target domains would
issue, on the first message, a query to one of the four
authoritative servers for rr.com (assume 25,000 queries per
server; it represents well under ten second's worth of work for
each of them) for the SPF record for rr.com.  This SPF record has
a TTL of one day, so each of the 100,000 target domains would
cache the record for one day, and all remaining SPF queries would
be disbursed to the 100,000 local (sets of) DNS caching servers
that are supporting the inbound mail servers at those domains.

In your scenario E2 above, let's presume that each zombie sends
email from 'user(_at_)foo(_dot_)rr(_dot_)com'.  Now, for those instances of
'foo.rr.com' where 'foo.rr.com' is not a valid DNS domain, I
presume that the SMTP transaction would be terminated by the
receiving host during the MAIL FROM part *before* SPF checking is
done; most modern MTAs have the capability of validating the
sender domain mentioned in the MAIL FROM part of the transaction,
and any that will reject due to its not existing would do this
before checking SPF records, one presumes, or at least as a side
effect of the check.  That is, if a query for the TXT record for
a domain returns NXDOMAIN, the mail would be rejected not on SPF
grounds, but rather on 'domain does not exist' grounds.  (Do MTAs
that check SPF do a separate query for the SPF record at this
point, or how is it typically handled?)

But let's presume that all 50,000 messages from each zombie are
sent with 'user(_at_)foo(_dot_)rr(_dot_)com' being valid rr.com DNS domains.  
The
75 valid domains have their DNS authority spread roughly equally
among nine data centers, eight of which has two authoritative
servers, with the ninth having four, but let's call it two each,
to make the math easy.  Let's presume worst case, with each of
the 100,000 target domains getting email "from" each of the 75
valid domains.  Let's go for the worst case, and say that none of
the 100,000 target domains have any of the 75 SPF records
currently cached.  This means that our authoritative servers
would have to absorb 7.5 million queries (41,667 per each of the
18 servers; still less than one minute's work per server); the
remainder of the SPF queries would then again be disbursed to the
local (sets of) DNS servers supporting the mail servers at the
target domains, because they would be caching the SPF records.

Am I missing something fundamental about SPF here?  Isn't SPF
record checking all about the sender domain (i.e., the domain
mentioned in the MAIL FROM portion of the SMTP transaction) and
whether or not the connecting IP is allowed to send email for
that domain?  That is, isn't it the MAIL FROM part of the
transaction that triggers the SPF check, not the EHLO/HELO part?

-- 
Todd Herr
Senior Security Policy Specialist/Postmaster      V: 703.345.2447
Time Warner Cable IP Security                     M: 571.344.8619
therr(_at_)security(_dot_)rr(_dot_)com                           AIM:  
RRCorpSecTH