spf-discuss
[Top] [All Lists]

Re: Re: DNS load research

2005-03-22 20:45:57
Radu Hociung wrote:
Andy Bakun wrote:

On Tue, 2005-03-22 at 12:46 -0500, Radu Hociung wrote:


I like the idea of weights, but it it is a purely academic exercise, because at run-time it is difficult or impossible to calculate the real expensiveness of a record. The checker can try estimating it, but it will probably not be nearly accurate enough to be useful. This is because of DNS caching.



I was not suggesting that SPF evaluators determine weights at runtime. I
was suggesting that the weights be fixed, relative to each other, as
part of the spec.  The weight of a record would be easily calculable
without needing to actually evaluate it.  Resolving MXs to IPs takes X
amount more work than resolving As.  Count resolving MXs, how ever
complex they may be (an acceptable average is what would need to be
determined), more than resolving As.


That's great, we're on the same page. This idea of weights is a great study tool, as it allows us to compare the relative cost of two queries that otherwise look alike. (a:%{i}.domain.com and a:something.domain.com have very different traffic costs)

The more I think about this, the more I like it.

It would be fantastic if we could come up with some formulas/methods of calculating the _relative_ cost of an SPF record.

This cost figure should factor in the following

- number of query mechanisms
- number of DNS packets required between the authoritative NS and the local caching NS
- the cacheability of the mechanisms, as a function of the macros they use.
- The TTL of the various components the SPF is made up of
- network latencies to overcome.

For the TTL item, an SPF record with a TTL of 24H that contains an A mech with TTL of 3 hours would be more expensive that a 24H TXT including a 24H A record, which would be more expensive than a 3-day TXT including a 3D A record.

For the latencies item, a record that points to 4 A records that are served by a NS server on a cable modem would be more expensive than a record with 4 A records hosted on an OC-48 line. It gets worse if the records are under a different TLD, as the recursive lookups to find that countries root servers would take longer, and the lookups to the NS itself would take longer too, because with a distant geographical location come multiple router hops and thus delays. Also, the further a NS is, the more likely it becomes that some queries will time out. Remeber that UDP is connectionless, send-and-forget, and when the packet has to go through more networks, it has more opportunities to be dropped because of congestion.

The _relative_ nature of this cost figure would mean that it can be used to compare two SPF records, but that the figures would only make sense if used at the same location.

Indeed, an SPF record in Canada might be cheaper than one in Hungary to a host in the US, to a host in Poland, the Canadian SPF record would be more expensive than the Hungarian one.

Indeed at different times of day different records will have different costs, as the network adapts and re-routes in order to balance the load. I can only imagine the magic that is going on as the different time-zones start and end the work-days. For instance, Monday at 9AM, the network between Florida and New York must be pretty busy, so the backbone may find a cheaper route for some packets through San Francisco, where it is 6AM, and most email-checking, internet-browsing employees are still sleeping.


It's a more difficult problem than it appears.

I don't know exactly what the application for this would be, but I'm willing to bet that whoever works on this problem will gain a great amount of knowledge about how The Network operates.

Perhaps if there were a central clearing house for SPF records, as someone suggested, it could use this information in assessing the different SPF records it deals with.


The other problem with the DDOS attack I described earlier is that when the internet is congested with something or other, say with the next version of MyDoom, the likelyhood of UDP packets dropped will be much higher than normal, so as MyDoom v2 is trying to spread from mail server to mail server, the recipient MTAs will see more timeouts because some UDP packets are dropped.

The more queries we allow, the more likely it is that the MTA will time out while receiving the MyDoom virus.

So, not only are the recipients MTA have much more mail to recieve, but they spend more time on each message because of the DNS timeouts. Meanwhile, the sending MTA's can't get the virus out fast enough, so they queue, and queue and queue.

I remember that last time, we dealt with the first wave of MyDoom by getting the MTA's to temporarily reject messages around a certain size.

The next time it happens, we'll do that again (I think we the earthlings are very resistant to learning, but that's a runt for another list). Also you bet that SPF checking will be disabled for a day. And since it will be (rightfully) perceived to have exacerbated the problem greatly, it will take some time before it will be turned back on, if the confidence in it ever comes back. And confidence will be hard to regain, unless something is fixed about it.

I don't know if there is a blow-up point, or congestion level at which the congestion grows exponentially, but I think it can get ugly, and it may be that the mail queues around the world will take a while to clear up.

It looks like a nightmare, but it's gotta be an interesting problem to look into for a network engineer.

I realize that I'm a little of a FUD salesman here, but it appears that we're not giving the risk enough of a thought yet.

So far, I think SPF is at the risk phase. I don't want it to get to the problem phase. In my engineering experience I learnt that the further you allow a problem to propagate the more expensive it will be to fix, and the cost increases exponentially.

I'm sure that old man in the corner is reading this and thinking, "You're right, Sunny, I told them that we should add authentication when we invented SMTP, but they said - Ahh... SMTP will never be abused, it's too cool of a protocol for that!"

I have a few more years to go, and then I'll be the old man in the corner. What story will I tell to my grandchildren ? :)

Radu.


<Prev in Thread] Current Thread [Next in Thread>