Re: Re: DNS load research
2005-03-22 20:45:57
Radu Hociung wrote:
Andy Bakun wrote:
On Tue, 2005-03-22 at 12:46 -0500, Radu Hociung wrote:
I like the idea of weights, but it it is a purely academic exercise,
because at run-time it is difficult or impossible to calculate the
real expensiveness of a record. The checker can try estimating it,
but it will probably not be nearly accurate enough to be useful. This
is because of DNS caching.
I was not suggesting that SPF evaluators determine weights at runtime. I
was suggesting that the weights be fixed, relative to each other, as
part of the spec. The weight of a record would be easily calculable
without needing to actually evaluate it. Resolving MXs to IPs takes X
amount more work than resolving As. Count resolving MXs, how ever
complex they may be (an acceptable average is what would need to be
determined), more than resolving As.
That's great, we're on the same page. This idea of weights is a great
study tool, as it allows us to compare the relative cost of two queries
that otherwise look alike. (a:%{i}.domain.com and a:something.domain.com
have very different traffic costs)
The more I think about this, the more I like it.
It would be fantastic if we could come up with some formulas/methods of
calculating the _relative_ cost of an SPF record.
This cost figure should factor in the following
- number of query mechanisms
- number of DNS packets required between the authoritative NS and the
local caching NS
- the cacheability of the mechanisms, as a function of the macros they use.
- The TTL of the various components the SPF is made up of
- network latencies to overcome.
For the TTL item, an SPF record with a TTL of 24H that contains an A
mech with TTL of 3 hours would be more expensive that a 24H TXT
including a 24H A record, which would be more expensive than a 3-day TXT
including a 3D A record.
For the latencies item, a record that points to 4 A records that are
served by a NS server on a cable modem would be more expensive than a
record with 4 A records hosted on an OC-48 line. It gets worse if the
records are under a different TLD, as the recursive lookups to find that
countries root servers would take longer, and the lookups to the NS
itself would take longer too, because with a distant geographical
location come multiple router hops and thus delays. Also, the further a
NS is, the more likely it becomes that some queries will time out.
Remeber that UDP is connectionless, send-and-forget, and when the packet
has to go through more networks, it has more opportunities to be dropped
because of congestion.
The _relative_ nature of this cost figure would mean that it can be used
to compare two SPF records, but that the figures would only make sense
if used at the same location.
Indeed, an SPF record in Canada might be cheaper than one in Hungary to
a host in the US, to a host in Poland, the Canadian SPF record would be
more expensive than the Hungarian one.
Indeed at different times of day different records will have different
costs, as the network adapts and re-routes in order to balance the load.
I can only imagine the magic that is going on as the different
time-zones start and end the work-days. For instance, Monday at 9AM, the
network between Florida and New York must be pretty busy, so the
backbone may find a cheaper route for some packets through San
Francisco, where it is 6AM, and most email-checking, internet-browsing
employees are still sleeping.
It's a more difficult problem than it appears.
I don't know exactly what the application for this would be, but I'm
willing to bet that whoever works on this problem will gain a great
amount of knowledge about how The Network operates.
Perhaps if there were a central clearing house for SPF records, as
someone suggested, it could use this information in assessing the
different SPF records it deals with.
The other problem with the DDOS attack I described earlier is that when
the internet is congested with something or other, say with the next
version of MyDoom, the likelyhood of UDP packets dropped will be much
higher than normal, so as MyDoom v2 is trying to spread from mail server
to mail server, the recipient MTAs will see more timeouts because some
UDP packets are dropped.
The more queries we allow, the more likely it is that the MTA will time
out while receiving the MyDoom virus.
So, not only are the recipients MTA have much more mail to recieve, but
they spend more time on each message because of the DNS timeouts.
Meanwhile, the sending MTA's can't get the virus out fast enough, so
they queue, and queue and queue.
I remember that last time, we dealt with the first wave of MyDoom by
getting the MTA's to temporarily reject messages around a certain size.
The next time it happens, we'll do that again (I think we the earthlings
are very resistant to learning, but that's a runt for another list).
Also you bet that SPF checking will be disabled for a day. And since it
will be (rightfully) perceived to have exacerbated the problem greatly,
it will take some time before it will be turned back on, if the
confidence in it ever comes back. And confidence will be hard to regain,
unless something is fixed about it.
I don't know if there is a blow-up point, or congestion level at which
the congestion grows exponentially, but I think it can get ugly, and it
may be that the mail queues around the world will take a while to clear up.
It looks like a nightmare, but it's gotta be an interesting problem to
look into for a network engineer.
I realize that I'm a little of a FUD salesman here, but it appears that
we're not giving the risk enough of a thought yet.
So far, I think SPF is at the risk phase. I don't want it to get to the
problem phase. In my engineering experience I learnt that the further
you allow a problem to propagate the more expensive it will be to fix,
and the cost increases exponentially.
I'm sure that old man in the corner is reading this and thinking,
"You're right, Sunny, I told them that we should add authentication when
we invented SMTP, but they said - Ahh... SMTP will never be abused, it's
too cool of a protocol for that!"
I have a few more years to go, and then I'll be the old man in the
corner. What story will I tell to my grandchildren ? :)
Radu.
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- Re: Re: DNS load research, (continued)
- Re: DNS load research, Frank Ellermann
- Re: Re: DNS load research, Andy Bakun
- Re: Re: DNS load research, Radu Hociung
- Re: Re: DNS load research, Andy Bakun
- Re: Re: DNS load research, Radu Hociung
- Re: Re: DNS load research,
Radu Hociung <=
- Re: Re: DNS load research, Michael Hammer
- Re: Re: DNS load research, Radu Hociung
- RE: Re: DNS load research, Scott Kitterman
- Response to DDoS using SPF, David Macquigg
- RE: Response to DDoS using SPF, Scott Kitterman
- Re: Response to DDoS using SPF, Michael Hammer
- RE: Response to DDoS using SPF, Andy Bakun
- Re: Response to DDoS using SPF, Mark Shewmaker
- Re: Response to DDoS using SPF, Andy Bakun
- Re: Re: DNS load research, Radu Hociung
|
|
|