Re: Re: DNS load research

Radu Hociung wrote:

Andy Bakun wrote:
On Tue, 2005-03-22 at 12:46 -0500, Radu Hociung wrote:
I like the idea of weights, but it it is a purely academic exercise,because at run-time it is difficult or impossible to calculate thereal expensiveness of a record. The checker can try estimating it,but it will probably not be nearly accurate enough to be useful. Thisis because of DNS caching.
I was not suggesting that SPF evaluators determine weights at runtime. I
was suggesting that the weights be fixed, relative to each other, as
part of the spec.  The weight of a record would be easily calculable
without needing to actually evaluate it.  Resolving MXs to IPs takes X
amount more work than resolving As.  Count resolving MXs, how ever
complex they may be (an acceptable average is what would need to be
determined), more than resolving As.
That's great, we're on the same page. This idea of weights is a greatstudy tool, as it allows us to compare the relative cost of two queriesthat otherwise look alike. (a:%{i}.domain.com and a:something.domain.comhave very different traffic costs)


The more I think about this, the more I like it.

It would be fantastic if we could come up with some formulas/methods ofcalculating the _relative_ cost of an SPF record.


This cost figure should factor in the following

- number of query mechanisms

- number of DNS packets required between the authoritative NS and thelocal caching NS

- the cacheability of the mechanisms, as a function of the macros they use.
- The TTL of the various components the SPF is made up of
- network latencies to overcome.

For the TTL item, an SPF record with a TTL of 24H that contains an Amech with TTL of 3 hours would be more expensive that a 24H TXTincluding a 24H A record, which would be more expensive than a 3-day TXTincluding a 3D A record.

For the latencies item, a record that points to 4 A records that areserved by a NS server on a cable modem would be more expensive than arecord with 4 A records hosted on an OC-48 line. It gets worse if therecords are under a different TLD, as the recursive lookups to find thatcountries root servers would take longer, and the lookups to the NSitself would take longer too, because with a distant geographicallocation come multiple router hops and thus delays. Also, the further aNS is, the more likely it becomes that some queries will time out.Remeber that UDP is connectionless, send-and-forget, and when the packethas to go through more networks, it has more opportunities to be droppedbecause of congestion.

The _relative_ nature of this cost figure would mean that it can be usedto compare two SPF records, but that the figures would only make senseif used at the same location.

Indeed, an SPF record in Canada might be cheaper than one in Hungary toa host in the US, to a host in Poland, the Canadian SPF record would bemore expensive than the Hungarian one.

Indeed at different times of day different records will have differentcosts, as the network adapts and re-routes in order to balance the load.I can only imagine the magic that is going on as the differenttime-zones start and end the work-days. For instance, Monday at 9AM, thenetwork between Florida and New York must be pretty busy, so thebackbone may find a cheaper route for some packets through SanFrancisco, where it is 6AM, and most email-checking, internet-browsingemployees are still sleeping.



It's a more difficult problem than it appears.

I don't know exactly what the application for this would be, but I'mwilling to bet that whoever works on this problem will gain a greatamount of knowledge about how The Network operates.

Perhaps if there were a central clearing house for SPF records, assomeone suggested, it could use this information in assessing thedifferent SPF records it deals with.

The other problem with the DDOS attack I described earlier is that whenthe internet is congested with something or other, say with the nextversion of MyDoom, the likelyhood of UDP packets dropped will be muchhigher than normal, so as MyDoom v2 is trying to spread from mail serverto mail server, the recipient MTAs will see more timeouts because someUDP packets are dropped.

The more queries we allow, the more likely it is that the MTA will timeout while receiving the MyDoom virus.

So, not only are the recipients MTA have much more mail to recieve, butthey spend more time on each message because of the DNS timeouts.Meanwhile, the sending MTA's can't get the virus out fast enough, sothey queue, and queue and queue.

I remember that last time, we dealt with the first wave of MyDoom bygetting the MTA's to temporarily reject messages around a certain size.

The next time it happens, we'll do that again (I think we the earthlingsare very resistant to learning, but that's a runt for another list).Also you bet that SPF checking will be disabled for a day. And since itwill be (rightfully) perceived to have exacerbated the problem greatly,it will take some time before it will be turned back on, if theconfidence in it ever comes back. And confidence will be hard to regain,unless something is fixed about it.

I don't know if there is a blow-up point, or congestion level at whichthe congestion grows exponentially, but I think it can get ugly, and itmay be that the mail queues around the world will take a while to clear up.

It looks like a nightmare, but it's gotta be an interesting problem tolook into for a network engineer.

I realize that I'm a little of a FUD salesman here, but it appears thatwe're not giving the risk enough of a thought yet.

So far, I think SPF is at the risk phase. I don't want it to get to theproblem phase. In my engineering experience I learnt that the furtheryou allow a problem to propagate the more expensive it will be to fix,and the cost increases exponentially.

I'm sure that old man in the corner is reading this and thinking,"You're right, Sunny, I told them that we should add authentication whenwe invented SMTP, but they said - Ahh... SMTP will never be abused, it'stoo cool of a protocol for that!"

I have a few more years to go, and then I'll be the old man in thecorner. What story will I tell to my grandchildren ? :)


Radu.