Re: short circuiting evaluation

Andy Bakun wrote:

On Thu, 2005-03-24 at 15:14, Radu Hociung wrote:

Andy Bakun wrote:

This is a very interesting idea, Radu.  Couldn't you currently short
circuit your entire eBay compiled record with:

       domain.com.        TXT "v=spf1 ~exists:%{ir1}._spf.%{d} "
                              " ...restofrecord... ~all"

X._spf.domain.com. A 127.0.0.1

(with 243 of these records, for all values of X in 0..255 except for the
13 you've listed that eBay uses) ?

But ebay's server is doing the compilation, and they might not have anRBL like map (which is what the exist mech implies).



In what way does the exists mechanism imply that there is a RBL behind
it?

Uh... my bad, I got too used to thinking RBL anytime a pseudo-hostnamewas assembled and then used to do a query. Sorry.

They'd have to publish:
1._spf.ebay.com A 127.0.0.?
...
254._spf.ebay.com A 127.0.0.?



Yes, that is exactly what I said, except for the 13 entries that prefix
the IP blocks they have valid senders in.  If the records to publish are
generated by the compiler, this isn't that big a deal.


Correct. Not a big deal to generate lots of records. :)

Each of these would have the same TTL as the compiled SPF record, ie,the minimum TTL seen in the convenient SPF record. Let's assume that is1hour. I'll move on, but keep this fact in mind.

Also, this exists mech would likely generate an DNS packet across thenet, because the host with %{ir1} is probably not in the cache.
After forgers from all corners of the world send me "ebay" email, mycache would have 243 junk entries.
So would it exist in the cache or wouldn't it?

Everything is written to the cache the fist time it's fetched. But if itexpires before being needed again, no traffic is saved. So caching didnot buy us anything. Therefore, (as good as) uncacheable. Perhaps weshould define that AGAU acronym to mean "as good as uncacheable".

Those wouldn't be "junk entries" during a zombie attack.  They would
serve the purpose of not having to evaluate the entire SPF record, no
matter how complex it is (all ip4 or some mix of other mechanisms).

In addition, as I said, _if_ one was to put a stunt DNS server behind
that _spf subdomain, the stunt DNS server could serve DIFFERENT cache
expiration times based on the load it is seeing.  During low load times,
it could serve zero, so it doesn't pollute the receiver's cache.  During
zombie attacks, it could increase it so the SPF checkers don't need to
perform DNS-load-increasing queries all the time.

I think changing the TTL based on load is extremely dangerous. I assumeyou mean that the TTL is increased when the load increases. So ifsomething fails under the heavier load and you have to relocate it,you'll suffer longer downtime because the TTLs are longer. It would belike stabbing it, then twisting the dagger around a few turns for goodmeasure. I'll think about whether it would actually help in thisscenario, but for a typical case, say the TTL of a web-server address,it's definately a bad idea to increase the TTL as you're pushing theequiment closer to failure. By failure I don't mean necessarily hardwarefailure, but maybe the log files fill up the disk and some criticalservice crashes, and corrupts the database. You'd have to switch to adifferent server while the failed one is restored. And then you find outthat the TTL is much longer, so you're down for longer just when thesales storm was going on. Yikes!!

Also, I was suggesting that the compiler would generate far more narrowmasks. I listed a few 8-bit ones that I noticed manually. I spent noeffort to make them better.
So if there is a forger at 65.0.0.1 and ebay uses 65.12.12.12, itsexists mechanism cannot return a positive for 65._spf.ebay.com, or itwould shoot down its own outgoing server. In order to not do this, itwould have to publish %{ir2}._spf.{d}, but this is not very flexible, asit cannot generate an arbitrarily tight blackout pattern like the mask can.
The exact optimization I was suggesting with this is that exactly what
the subject line says: short circuit the rest of the record.  A single
query can rule out the evaluation of the rest of the record, no matter
what the rest of the record is, as such, the number of queries required
for a significant portion of the internet is 1.  Putting the negation of
your range in zone files is necessary to implement this because there
is, unfortunately, no way to negate a check.  That is:

        -a:64/8
means
        if the IP starts with 64, then it is not authorized
not
        if the IP doesn't start with 64, then it is not authorized
But you can do the latter with:
        -exists:%{ir1}.%{d}
And add RRs for all the entries that are not allowed.
This means you can put more expensive entries after the exists, and use
a shorter, non-chained record, which results in fewer queries also.Your ebay example:
ebay.com TXT "v=spf1 ip4:66.135.195.180 ip4:66.135.195.181ip4:66.135.209.192/27 ip4:66.135.197.0/27 redirect=_s0.%{o}"_s0.ebay.com TXT "v=spf1 ip4:64.4.240.64/27 ip4:64.4.244.64/27ip4:66.135.215.224/27 ip4:216.33.244.96/27 redirect=_s1.%{o}"_s1.ebay.com TXT "v=spf1 ip4:216.33.244.84 ip4:67.72.99.26ip4:206.165.246.83 ip4:206.165.246.84 ip4:206.165.246.85
redirect=_s2.%{o}"
_s2.ebay.com TXT "v=spf1 ip4:206.165.246.86 ip4:64.127.115.252ip4:194.64.234.129/27 ip4:65.110.161.77 ip4:12.155.144.75
redirect=_s3.%{o}"
_s3.ebay.com TXT "v=spf1 ip4:62.22.61.131 ip4:63.104.149.126ip4:64.68.79.253 ip4:64.94.204.222 ip4:66.135.215.134 redirect=_s4.%{o}"_s4.ebay.com TXT "v=spf1 ip4:67.72.12.29 ip4:80.93.9.10ip4:195.234.136.12 ip4:203.49.69.114 ip4:209.63.28.11 redirect=_s5.%{o}"_s5.ebay.com TXT "v=spf1 ip4:210.80.80.136 ip4:212.110.10.2ip4:212.147.136.123 ip4:213.219.8.227 ip4:216.113.168.128
redirect=_s6.%{o}"
_s6.ebay.com TXT "v=spf1 ip4:216.113.175.128 ip4:216.177.178.3ip4:217.149.33.234 ip4:220.248.6.124 ip4:67.72.12.30 redirect=_s7.%{o}"_s7.ebay.com TXT "v=spf1 ip4:216.113.188.112 ip4:80.66.137.58ip4:212.208.64.34 ip4:216.113.188.96 ~all"
Requires a variable number of queries in general up to nine, and nine
exactly in order to find out if 216.113.188.96 is allowed, since it is
last in the list, and always nine if the IP is not authorized.  Rather,
my alternative 244 (1 SPF TXT + 243 A) records reduce the number of
queries to TWO [physical] in the worst case independent of caching:

"v=spf1 -exists:%{ir1}._spf.%{d} +mx -all"


Ok, but let's look at a higher connection rate for a second.

If you get 255 connections from around the world, 1 from each A classnet, you have to do 1 query (TXT) + 243 A (for those which are indifferent class A nets than ebay's servers) + 8 queries for those thatare in the same class A nets as ebay.

In this case, the cache was proven useful 254 times out of 255 for thetop TXT record, _up_to_ 7 times out of 8 for the remaining TXT records,and it wasn't useful at all for the exists lookups.


The total traffic across the net was 252 queries.

In my proposal, the mask is delivered with the 1st SPF record. If it isa good and narrow mask, it will recognize that even the connections fromthe same class A nets as ebay's servers are not close enough to ebay'sservers. Say forger at 216.113.0.1 and ebay uses 216.113.188.96. With atight mask like m=216.113.128.0/17 I can cover all of ebay's serversaround 216.113.x.x (which are listed in records _s5 and on). In thiscase, I just saved 5 lookups, because I can tell from record 1 that theforger is on the outside of my blackout zone.

So in the best case, I only ever need to do 1 lookup, the initial TXT,and the mask there will predict for me that I don't need to do the other8 lookups for records _s0 through _s7.

That was the most fortunate case, when the masks were so good that theynever lead me astray. And really their goodness was only tested 8 times,because only 8 times did the forger's IP come close enough. So to beable to withstand 8 predictions is not that extraordinary any way.

In the worst case, the masks will prove to be no good, and a waste ofthe 20 bytes they occupy. In this case, I will have to do 1 query forthe top TXT, plus 8 more for _s0 through _s7. But the query/answerpackets only travel across the internet once each, for a total of 9queries. The rest of the time, they are returned out of the local cache.

So in this least fortunate case, I had to do all 9 queries.

Most of the time, the masks' quality will be somewhere in between, sofor a scan from across the world, I will pay for between 1 and 9 queries.

If I understand your proposal, the same scan would cost 252 queriesexactly. That's much more expensive.

I think that every time you publish more information, more of it will beused, and less of it will be reused. The less you publish, the morelikely it is that it will be reused from cache.


Also, note the important difference between the two approaches.

Your approach is open loop, meaning that you always generate a new queryregardless of the input, while mine is close loop, meaning that I takeinto account the incoming IP before deciding whether to do a query. I'mmore likely not to generate a new query, since it's more likely that themask will cover most incoming IPs.

Another way to put is that your propensity or preference to generate newqueries is higher than mine. So inevitably, you will generate morequeries, and therefor traffic.

1 for the exists and 1 for the MX (if the entire MX list fits in the
additional portion of the MX response).  In any case, this gains back
some of the usefulness of the other mechanisms without having to
recompile (or test for needing to recompile) continually and without
forcing their complex evaluation in all instances.  The cache expire
time for the records used in exists should definitely be kept low.

Excellent! so let's look at a 24 hour period. Say that we get 2540connections per hour, 10 from each class A network. Let's assume a TTLof 24H for the MX, 1H for the exists records, and 1H for the TXT record.

Recall I explained why the exists records have the same TTL as the TXTrecord.


Total traffic with your method:
1*MX + 24*TXT  +  24*254*A = 6121 queries during the 24H period.

In total, you called ns_resolv 24*2540*3 times (182880 times). So thecache saved you traffic 96.6% of the time.

With my method, mask included at the end of the top level TXT, total of9 records with the same TTL of 1H. The records are fully compiled andcontain only IP4 and redirects.


Total traffic with my method:
24*1*TXT = 24 queries, if the mask is top notch.
24*9*TXT = 216 queries, if the mask is useless.

More likely the actual number of queries is between 24 and 216.

In total, I called ns_resolv between 24*2540*1=60960 times if I the maskwas top notch and 24*2540*9=548640 times if the mask was crap. So thecache saved me traffic exactly 99.96% of the time, whether the mask wasgood or not.

As you can see, there's a huge difference, and most of it is owed to thefact that the exists are AGAU, even though 96.6% _looks_ like a prettyhigh cache efficiency.

This is simply trading space (number of records) for time (number of
queries).  It is _another_ way to compile records, which has its
benefits and trade offs too.

I'm going to sleep on this for a little while, and see how the exists:method can be better than the mask method. One obvious way is if allforger traffic came from the same A class net all the time, _AND_ thespecific address was close enough to the servers that the mask wouldmiss it. In that case, you'd do 1*MX+24*TXT+24*A (49 queries), whilewith the mask, I'd do 24*9*TXT (216 queries). This might be the worstcase for the mask. It's pretty unrealistic though, given all therestrictive ifs. I'll continue to think of a more realistic scenarionwhen the exists method would prove more efficient than the mask method.


Whatever we conclude, I really enjoy these thoughtful discussions.

Thanks,
Radu.