Re: short circuiting evaluation
2005-03-24 22:08:17
Andy Bakun wrote:
On Thu, 2005-03-24 at 15:14, Radu Hociung wrote:
Andy Bakun wrote:
This is a very interesting idea, Radu. Couldn't you currently short
circuit your entire eBay compiled record with:
domain.com. TXT "v=spf1 ~exists:%{ir1}._spf.%{d} "
" ...restofrecord... ~all"
X._spf.domain.com. A 127.0.0.1
(with 243 of these records, for all values of X in 0..255 except for the
13 you've listed that eBay uses) ?
But ebay's server is doing the compilation, and they might not have an
RBL like map (which is what the exist mech implies).
In what way does the exists mechanism imply that there is a RBL behind
it?
Uh... my bad, I got too used to thinking RBL anytime a pseudo-hostname
was assembled and then used to do a query. Sorry.
They'd have to publish:
1._spf.ebay.com A 127.0.0.?
...
254._spf.ebay.com A 127.0.0.?
Yes, that is exactly what I said, except for the 13 entries that prefix
the IP blocks they have valid senders in. If the records to publish are
generated by the compiler, this isn't that big a deal.
Correct. Not a big deal to generate lots of records. :)
Each of these would have the same TTL as the compiled SPF record, ie,
the minimum TTL seen in the convenient SPF record. Let's assume that is
1hour. I'll move on, but keep this fact in mind.
Also, this exists mech would likely generate an DNS packet across the
net, because the host with %{ir1} is probably not in the cache.
After forgers from all corners of the world send me "ebay" email, my
cache would have 243 junk entries.
So would it exist in the cache or wouldn't it?
Everything is written to the cache the fist time it's fetched. But if it
expires before being needed again, no traffic is saved. So caching did
not buy us anything. Therefore, (as good as) uncacheable. Perhaps we
should define that AGAU acronym to mean "as good as uncacheable".
Those wouldn't be "junk entries" during a zombie attack. They would
serve the purpose of not having to evaluate the entire SPF record, no
matter how complex it is (all ip4 or some mix of other mechanisms).
In addition, as I said, _if_ one was to put a stunt DNS server behind
that _spf subdomain, the stunt DNS server could serve DIFFERENT cache
expiration times based on the load it is seeing. During low load times,
it could serve zero, so it doesn't pollute the receiver's cache. During
zombie attacks, it could increase it so the SPF checkers don't need to
perform DNS-load-increasing queries all the time.
I think changing the TTL based on load is extremely dangerous. I assume
you mean that the TTL is increased when the load increases. So if
something fails under the heavier load and you have to relocate it,
you'll suffer longer downtime because the TTLs are longer. It would be
like stabbing it, then twisting the dagger around a few turns for good
measure. I'll think about whether it would actually help in this
scenario, but for a typical case, say the TTL of a web-server address,
it's definately a bad idea to increase the TTL as you're pushing the
equiment closer to failure. By failure I don't mean necessarily hardware
failure, but maybe the log files fill up the disk and some critical
service crashes, and corrupts the database. You'd have to switch to a
different server while the failed one is restored. And then you find out
that the TTL is much longer, so you're down for longer just when the
sales storm was going on. Yikes!!
Also, I was suggesting that the compiler would generate far more narrow
masks. I listed a few 8-bit ones that I noticed manually. I spent no
effort to make them better.
So if there is a forger at 65.0.0.1 and ebay uses 65.12.12.12, its
exists mechanism cannot return a positive for 65._spf.ebay.com, or it
would shoot down its own outgoing server. In order to not do this, it
would have to publish %{ir2}._spf.{d}, but this is not very flexible, as
it cannot generate an arbitrarily tight blackout pattern like the mask can.
The exact optimization I was suggesting with this is that exactly what
the subject line says: short circuit the rest of the record. A single
query can rule out the evaluation of the rest of the record, no matter
what the rest of the record is, as such, the number of queries required
for a significant portion of the internet is 1. Putting the negation of
your range in zone files is necessary to implement this because there
is, unfortunately, no way to negate a check. That is:
-a:64/8
means
if the IP starts with 64, then it is not authorized
not
if the IP doesn't start with 64, then it is not authorized
But you can do the latter with:
-exists:%{ir1}.%{d}
And add RRs for all the entries that are not allowed.
This means you can put more expensive entries after the exists, and use
a shorter, non-chained record, which results in fewer queries also.
Your ebay example:
ebay.com TXT "v=spf1 ip4:66.135.195.180 ip4:66.135.195.181
ip4:66.135.209.192/27 ip4:66.135.197.0/27 redirect=_s0.%{o}"
_s0.ebay.com TXT "v=spf1 ip4:64.4.240.64/27 ip4:64.4.244.64/27
ip4:66.135.215.224/27 ip4:216.33.244.96/27 redirect=_s1.%{o}"
_s1.ebay.com TXT "v=spf1 ip4:216.33.244.84 ip4:67.72.99.26
ip4:206.165.246.83 ip4:206.165.246.84 ip4:206.165.246.85
redirect=_s2.%{o}"
_s2.ebay.com TXT "v=spf1 ip4:206.165.246.86 ip4:64.127.115.252
ip4:194.64.234.129/27 ip4:65.110.161.77 ip4:12.155.144.75
redirect=_s3.%{o}"
_s3.ebay.com TXT "v=spf1 ip4:62.22.61.131 ip4:63.104.149.126
ip4:64.68.79.253 ip4:64.94.204.222 ip4:66.135.215.134 redirect=_s4.%{o}"
_s4.ebay.com TXT "v=spf1 ip4:67.72.12.29 ip4:80.93.9.10
ip4:195.234.136.12 ip4:203.49.69.114 ip4:209.63.28.11 redirect=_s5.%{o}"
_s5.ebay.com TXT "v=spf1 ip4:210.80.80.136 ip4:212.110.10.2
ip4:212.147.136.123 ip4:213.219.8.227 ip4:216.113.168.128
redirect=_s6.%{o}"
_s6.ebay.com TXT "v=spf1 ip4:216.113.175.128 ip4:216.177.178.3
ip4:217.149.33.234 ip4:220.248.6.124 ip4:67.72.12.30 redirect=_s7.%{o}"
_s7.ebay.com TXT "v=spf1 ip4:216.113.188.112 ip4:80.66.137.58
ip4:212.208.64.34 ip4:216.113.188.96 ~all"
Requires a variable number of queries in general up to nine, and nine
exactly in order to find out if 216.113.188.96 is allowed, since it is
last in the list, and always nine if the IP is not authorized. Rather,
my alternative 244 (1 SPF TXT + 243 A) records reduce the number of
queries to TWO [physical] in the worst case independent of caching:
"v=spf1 -exists:%{ir1}._spf.%{d} +mx -all"
Ok, but let's look at a higher connection rate for a second.
If you get 255 connections from around the world, 1 from each A class
net, you have to do 1 query (TXT) + 243 A (for those which are in
different class A nets than ebay's servers) + 8 queries for those that
are in the same class A nets as ebay.
In this case, the cache was proven useful 254 times out of 255 for the
top TXT record, _up_to_ 7 times out of 8 for the remaining TXT records,
and it wasn't useful at all for the exists lookups.
The total traffic across the net was 252 queries.
In my proposal, the mask is delivered with the 1st SPF record. If it is
a good and narrow mask, it will recognize that even the connections from
the same class A nets as ebay's servers are not close enough to ebay's
servers. Say forger at 216.113.0.1 and ebay uses 216.113.188.96. With a
tight mask like m=216.113.128.0/17 I can cover all of ebay's servers
around 216.113.x.x (which are listed in records _s5 and on). In this
case, I just saved 5 lookups, because I can tell from record 1 that the
forger is on the outside of my blackout zone.
So in the best case, I only ever need to do 1 lookup, the initial TXT,
and the mask there will predict for me that I don't need to do the other
8 lookups for records _s0 through _s7.
That was the most fortunate case, when the masks were so good that they
never lead me astray. And really their goodness was only tested 8 times,
because only 8 times did the forger's IP come close enough. So to be
able to withstand 8 predictions is not that extraordinary any way.
In the worst case, the masks will prove to be no good, and a waste of
the 20 bytes they occupy. In this case, I will have to do 1 query for
the top TXT, plus 8 more for _s0 through _s7. But the query/answer
packets only travel across the internet once each, for a total of 9
queries. The rest of the time, they are returned out of the local cache.
So in this least fortunate case, I had to do all 9 queries.
Most of the time, the masks' quality will be somewhere in between, so
for a scan from across the world, I will pay for between 1 and 9 queries.
If I understand your proposal, the same scan would cost 252 queries
exactly. That's much more expensive.
I think that every time you publish more information, more of it will be
used, and less of it will be reused. The less you publish, the more
likely it is that it will be reused from cache.
Also, note the important difference between the two approaches.
Your approach is open loop, meaning that you always generate a new query
regardless of the input, while mine is close loop, meaning that I take
into account the incoming IP before deciding whether to do a query. I'm
more likely not to generate a new query, since it's more likely that the
mask will cover most incoming IPs.
Another way to put is that your propensity or preference to generate new
queries is higher than mine. So inevitably, you will generate more
queries, and therefor traffic.
1 for the exists and 1 for the MX (if the entire MX list fits in the
additional portion of the MX response). In any case, this gains back
some of the usefulness of the other mechanisms without having to
recompile (or test for needing to recompile) continually and without
forcing their complex evaluation in all instances. The cache expire
time for the records used in exists should definitely be kept low.
Excellent! so let's look at a 24 hour period. Say that we get 2540
connections per hour, 10 from each class A network. Let's assume a TTL
of 24H for the MX, 1H for the exists records, and 1H for the TXT record.
Recall I explained why the exists records have the same TTL as the TXT
record.
Total traffic with your method:
1*MX + 24*TXT + 24*254*A = 6121 queries during the 24H period.
In total, you called ns_resolv 24*2540*3 times (182880 times). So the
cache saved you traffic 96.6% of the time.
With my method, mask included at the end of the top level TXT, total of
9 records with the same TTL of 1H. The records are fully compiled and
contain only IP4 and redirects.
Total traffic with my method:
24*1*TXT = 24 queries, if the mask is top notch.
24*9*TXT = 216 queries, if the mask is useless.
More likely the actual number of queries is between 24 and 216.
In total, I called ns_resolv between 24*2540*1=60960 times if I the mask
was top notch and 24*2540*9=548640 times if the mask was crap. So the
cache saved me traffic exactly 99.96% of the time, whether the mask was
good or not.
As you can see, there's a huge difference, and most of it is owed to the
fact that the exists are AGAU, even though 96.6% _looks_ like a pretty
high cache efficiency.
This is simply trading space (number of records) for time (number of
queries). It is _another_ way to compile records, which has its
benefits and trade offs too.
I'm going to sleep on this for a little while, and see how the exists:
method can be better than the mask method. One obvious way is if all
forger traffic came from the same A class net all the time, _AND_ the
specific address was close enough to the servers that the mask would
miss it. In that case, you'd do 1*MX+24*TXT+24*A (49 queries), while
with the mask, I'd do 24*9*TXT (216 queries). This might be the worst
case for the mask. It's pretty unrealistic though, given all the
restrictive ifs. I'll continue to think of a more realistic scenarion
when the exists method would prove more efficient than the mask method.
Whatever we conclude, I really enjoy these thoughtful discussions.
Thanks,
Radu.
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- Re: Standard Authentication Query, (continued)
- Re: Standard Authentication Query, Radu Hociung
- Re: Standard Authentication Query, Mark Shewmaker
- Re: DNS Query Format, Radu Hociung
- Re: DNS Query Format, David MacQuigg
- Re: Re: DNS load research, Radu Hociung
- short circuiting evaluation, Andy Bakun
- Re: short circuiting evaluation, Radu Hociung
- Re: short circuiting evaluation, Andy Bakun
- Re: short circuiting evaluation,
Radu Hociung <=
- Re: short circuiting evaluation, Andy Bakun
- Re: short circuiting evaluation, Radu Hociung
- Re: Re: DNS load research, Stuart D. Gathman
- Re: Re: DNS load research, Radu Hociung
- Re: Re: DNS load research, David MacQuigg
- Re: Re: DNS load research, Radu Hociung
- Re: Re: DNS load research, Andy Bakun
- Re: Re: DNS load research, Andy Bakun
- Re: DNS load research, Frank Ellermann
- Re: Re: DNS load research, Radu Hociung
|
|
|