Frank Ellermann wrote:
Each mx / ptr / %p has its own limit of 10 MXs or 10 PTRs,
also a MUST. If you _add_ all MX queries for different
mx mechanisms, you get completely different results.
That is correct. Let's call the real limit of spf-classic-00
*DoS limit*. For all practical purposes, it currently stands at 111.
That's what the worst case SPF will cost: 111. I would think that
as the SPF evaluation approaches 111 lookups, the probability of
a PermError increases, so chances are good that 111 lookups will
have been a waste of time and resources. This is why this limit
is called a DoS limit.
I'm going to look a little bit into the two limits I believe to
be needed: the *DoS* limit, and the *reliability* limit.
*About the DoS limit*
The large domains probably employ mail servers that have a
calculated load of around 50% resource utilization (a guess).
(I'm thinking of those with tens of primary MX's). As it is
now, those servers probably do about 3 DNS lookups for every
incoming email, give or take a couple of lookups: - A lookup on
the sender domain, to find fake domains - PTR lookup to see if
the IP is at least likely to be a legitimate mail sernder. - a
lookup to a RBL or something similar
If implementing SPF would require them to observe a 3700%
increase in maximum load of those machines, that would be a
major headache. In order to deliver the _same mail volume_,
they would need to grow their front-end infrastructure 37x.
This is because they would have to ensure the same level of
delivery even during a sustained DoS attack. Also this means
that on an average day without a major DoS attack, their mail
machines would run at 50%/37, or about a 1.35% utilization.
From a financial investment point of view this is likely not a
good value proposition.
*Reliability minimum limit*
This said, I don't think there is something wrong with having a
*DoS limit*. However, there should also be a lower *minimum
limit*.
The same busy site would want to do a number of lookups that
would guarantee reliable SPF evaluation of a sender. This is
the limit that says 'if you do at least N lookups, you will not
misdiagnose any legitimate mail, coming from properly
configured domains'. I think this number should be 10 or less.
Which in turn would cause between 12-14 lookups max for each
incoming mail:
- 1 for the PTR
- 10 for the SPF
- 1 for the blacklist
- give or take a couple for other white/black lists
This represents only a 400% increase, and the average load of
the front end servers would go down to 12.5%. Still a very poor
value proposition, but much more acceptable than the 1.35%
proposition.
*My conclusions*
1. I think the DoS limit could be used to black list domains. Any
domain that yields a 111 lookup load should be blacklisted, so
it is never looked up in the future. A well configured,
legitimate domain would never be black listed, because its SPF
record does under no circumstances hit the *DoS* lookup limit.
2. Another minimum limit is needed: how many lookups should be
reasonably needed to *reliably* prove authenticity? I think
no more than 10. this is the number that will cause
infrastructure adjustments.
3. Any SPF record that fall between the 2 limits chould be result
in unreliable authentication. A PermError because of
exceeding the *minimum* limit would be treated cautiosly (as
a 'none', perhaps), while a PermError because of exceeding
the *DoS* limit would be treated as a 'fail')
4. If you agree on the need for a second limit, I also conclude
the following:
* The spec should softly recommend the *DoS* limit be
implemented (MAY or SHOULD)
* It should strongly recommend the *reliability limit*
(SHOULD or MUST)
* A large gap is not necessary, and the *DoS* limit should be
decreased to something more reasonable, perhaps between
20 and 40
Greetings,
Radu.