Re: short circuiting evaluation

Andy Bakun wrote:

This TTL discussion on the A records that exists uses is academic, as
I've changed my position on it absolutely needing to be low.

I think changing the TTL based on load is extremely dangerous. I assumeyou mean that the TTL is increased when the load increases.



The TTL I was suggesting for these exists: records is either around zero
or significantly smaller than you'd normally set the TTL to if you
wanted to avoid significant downtime, such that even an increase like 2x
or 3x is still less than 5 or 10 minutes.

If you're getting hammered with queries at x per second with a TTL of 10
seconds, then your load would be x/2 if you increase the TTL to 20
seconds.  This isn't significantly different, but when there's an attack
going on, this would reduce your load without significantly hindering
your ability to fail over.

So ifsomething fails under the heavier load and you have to relocate it,you'll suffer longer downtime because the TTLs are longer.



There's also nothing keeping it from going the other way -- set the TTL
to 1 hour in the normal case.  If it looks like your load is up because
your domain is being forged, decrease it so that in the case you do need
to fail over, your downtime window is decreased.  That is, if you've
been serving TTLs of 1hour, and you change to 30 minutes, then cache
entries will expire in an average of 45 minutes, thereby reducing the
length your downtime/inaccessibility due to caching.

The instant you publish a 30-minute record, all the records you'veissues this far have been 1-hour. So all records out there have 1-hourexpirations. Changing the published TTL does not instantly change theaverage TTL out there, but it would take 1 hour for all the previouslyissued to expire and be refreshed with records of the new limit.

Incidentally, if you have a spread of 1-hour TTLs out in the field,their average time-to-refresh, or remaining TTL is 30 minutes. If youhave a spread of 30-minute TTLs, the average time-to-refresh is 15minutes. So if you do what you said, the average TTL out there woulddrop from 30-minutes to 15-minutes over the course of 1-hour.

I'll think about whether it would actually help in thisscenario, but for a typical case, say the TTL of a web-server address,it's definately a bad idea to increase the TTL as you're pushing theequiment closer to failure.
Sure, except we are specifically not talking about "typical cases" here,
and especially not web-servers.  If I send email and your server is
overloaded, I may get a DSN from my server saying it temporarily can't
connect, but assuming your load comes down in a reasonable amount of
time, the mail will go through with no action on any one else's part.Web servers are really different, in that if your retail website isn't
responding, people will immediately go to your competitors.

In the "typical case", email goes through.  The atypical case is the
mythical SPF-doom virus that is pounding on mail servers causing mail
servers to pound on DNS through SPF.  I thought we were trying to
optimize for the atypical case here.


Slight correction: The virus _taps_ on MTAs, and MTAs _pound_ on DNS. ;)

You're right, the mail situation is somewhat different. While I considera 4-hour delay in email unnacceptable, ymmv. 4-hours is the default timeafter which sendmail warns that it hasn't been able to deliver. But whenit gives the warning it still means it's unknown how much longer it'llhave to try.

But let's not concentrate on that since it takes energy away from thereal discussion. We can start a separate thread for fail-over if you like.

Ok, but let's look at a higher connection rate for a second.
If you get 255 connections from around the world, 1 from each A classnet, you have to do 1 query (TXT) + 243 A (for those which are indifferent class A nets than ebay's servers) + 8 queries for those thatare in the same class A nets as ebay.
This is a fine thought experiment, but most likely not that realistic.Chances are, most, if not all, zombied machines are going to come from
some small (and maybe even predictable) set of class A addresses during
any given single attack.  It seems most of the single digit class As are
out immediately, for example.  I'd think large attacks are going to come
from IP blocks that are hosting connectivity services sold to the
public/consumers.

Very well. I used my 3-month long maillog as a research resource again,and I found that my mail server's port 25 is accessible from mostcorners of the world. Below I show the distribution of incomingconnections from different class A hosts.

My host sees very modest volumes of SMTP activity. Thus it checks veryfew SPF records. A more central mail server would see far moreconnections. Since my total number of connections per day is about 48.

I will try to estimate what a more central mail server might see, andthus I multiplied the number of connections I see by a factor of 20(still very modest). Hotmail bounces 2-billion spams a day. A large sitelike that would have to do even more exists queries than my setup*20.They may get much closer to the worst case I described before. But let'ssee how close would a very modest _realistic_ case get.

Anyway, I have applied the distribution I saw at my server, but used 20xthe volume, and I will compare how many queries your exists method mightrequire, vs. how many the mask method might require:

Please note that multiplying the number of connections does not changethe shape of the distribution. For A nets that I saw no connectionsfroms, the multiplication will not create any connections. However, fornets where I get a small number of daily connects on average, that dailyaverage will be increased by 20x (ex. From net 212 I see about 0.3connections per day; after the volume increase, I would see about 6connections per day - the table rounds 6.7 to 7 for net 212)


Column A is the Class A net.

Column B is the total number of connections from an IP in that net thatI have seen in the last 3 months

Column C is the percentage out of total number of connects.
Column D is the average daily number of connects from that net (col B/90)

Column E is the estimated number of connects for a more central site(ohmi's average daily connects * 20). I will refer to this asconnections_per_24H.


For the exists column (F), I have used the following formula:

    if (connections_per_24H > 24) queries = 24;
    else queries = connections_per_24H

Column G is the number of MX queries you'd have to do. I have assumedthat we keep to the ebay example. In that context, your published TXTrecords would be "v=spf1 -exists:%{ir1}._spf.%{d} +... the long list ofIPs, spread over the same number of _s extensions as my example" Perhapssince my mask takes 61 bytes, and your mech takes 24 bytes, in your casethe ebay record may be shorter by 1 TXT query.

Column H is the number of TXT queries that you'd have to do with theexists method (=1+7*column_G). I gave you the benefit of the doubt andassumed that your functionally identical record to mine would be 8 TXTrecords instead of 9 because of the shorter mask string.

For the column J I used the number of TXT queries that my previous ebayrecord would generate (=1+8*column_G), if we used the mask method, witha mask equivalent to your "exist mask":


-m=64/6  m=80 m=194 m=203 m=206 m=209 m=210 m=212 m=216 m=220

I have already pointed out that this mask is very poor. A better maskwould have higher CIDRs and therefore be more narrow. But I want tocompare apples to apples. The information published by the DNS server isthe same in both cases, except it is published in different ways. In myway, it is published as the above 61 string appended to the the top TXTrecord at ebay.com, and in your case, it would be published with the254-13=243 A records generated by the compiler. Actually in both cases,the information is generated by the compiler.

Since my mask can be made narrower without adding extra queries, thenumbers in column J should read "<=", ie at _at_most_ 9 queries.Nonetheless, the total for the column G is based on 9 queries. It lookslike in the past 3 months I got no connects from one of the nets thatebay uses, otherwise the total would have been 9*24


    A      B     C     D       E      F   G   H    J
    65    849  19.6%  9.4    188.7   24   24  192  216
    66    562  13.0%  6.2    124.9   24   24  192  216
    61    208  4.8%   2.3    46.2    24   0   1    1
    24    177  4.1%   2.0    39.3    24   0   1    1
    211   177  4.1%   2.0    39.3    24   0   1    1
    218   153  3.5%   1.7    34.0    24   0   1    1
    200   130  3.0%   1.4    28.9    24   0   1    1
    69    112  2.6%   1.2    24.9    24   0   1    1
    220   110  2.5%   1.2    24.4    24   0   1    1
    82    108  2.5%   1.2    24.0    24   0   1    1
    202   101  2.3%   1.1    22.4    22   0   1    1
    222   97   2.2%   1.1    21.6    22   0   1    1
    68    89   2.1%   1.0    19.8    20   0   1    1
    221   85   2.0%   0.9    18.9    19   0   1    1
    81    74   1.7%   0.8    16.4    16   0   1    1
    207   69   1.6%   0.8    15.3    15   0   1    1
    210   66   1.5%   0.7    14.7    15   15  120  135
    132   63   1.5%   0.7    14.0    14   0   1    1
    219   59   1.4%   0.7    13.1    13   0   1    1
    38    57   1.3%   0.6    12.7    13   0   1    1
    67    56   1.3%   0.6    12.4    12   12  96   108
    80    56   1.3%   0.6    12.4    12   12  96   108
    213   56   1.3%   0.6    12.4    12   0   1    1
    83    54   1.2%   0.6    12.0    12   0   1    1
    4     51   1.2%   0.6    11.3    11   0   1    1
    62    49   1.1%   0.5    10.9    11   0   1    1
    217   48   1.1%   0.5    10.7    11   0   1    1
    64    44   1.0%   0.5    9.8     10   10  80   90
    84    43   1.0%   0.5    9.6     10   0   1    1
    203   42   1.0%   0.5    9.3     9    9   72   81
    216   38   0.9%   0.4    8.4     8    8   64   72
    201   36   0.8%   0.4    8.0     8    0   1    1
    206   36   0.8%   0.4    8.0     8    8   64   72
    209   33   0.8%   0.4    7.3     7    7   56   63
    192   32   0.7%   0.4    7.1     7    0   1    1
    194   31   0.7%   0.3    6.9     7    7   56   63
    212   30   0.7%   0.3    6.7     7    7   56   63
    85    22   0.5%   0.2    4.9     5    0   1    1
    60    21   0.5%   0.2    4.7     5    0   1    1
    12    20   0.5%   0.2    4.4     4    0   1    1
    70    19   0.4%   0.2    4.2     4    0   1    1
    168   17   0.4%   0.2    3.8     4    0   1    1
    63    15   0.3%   0.2    3.3     3    0   1    1
    204   13   0.3%   0.1    2.9     3    0   1    1
    205   13   0.3%   0.1    2.9     3    0   1    1
    59    11   0.3%   0.1    2.4     2    0   1    1
    195   10   0.2%   0.1    2.2     2    0   1    1
    129   8    0.2%   0.1    1.8     2    0   1    1
    198   8    0.2%   0.1    1.8     2    0   1    1
    193   7    0.2%   0.1    1.6     2    0   1    1
    151   6    0.1%   0.1    1.3     1    0   1    1
    163   6    0.1%   0.1    1.3     1    0   1    1
    131   5    0.1%   0.1    1.1     1    0   1    1
    144   5    0.1%   0.1    1.1     1    0   1    1
    208   5    0.1%   0.1    1.1     1    0   1    1
    130   4    0.1%   0.0    0.9     1    0   1    1
    141   4    0.1%   0.0    0.9     1    0   1    1
    71    3    0.1%   0.0    0.7     1    0   1    1
    136   3    0.1%   0.0    0.7     1    0   1    1
    148   3    0.1%   0.0    0.7     1    0   1    1
    161   3    0.1%   0.0    0.7     1    0   1    1
    166   3    0.1%   0.0    0.7     1    0   1    1
    196   3    0.1%   0.0    0.7     1    0   1    1
    128   2    0.0%   0.0    0.4     0    0   1    1
    138   2    0.0%   0.0    0.4     0    0   1    1
    167   2    0.0%   0.0    0.4     0    0   1    1
    43    1    0.0%   0.0    0.2     0    0   1    1
    58    1    0.0%   0.0    0.2     0    0   1    1
    133   1    0.0%   0.0    0.2     0    0   1    1
    142   1    0.0%   0.0    0.2     0    0   1    1
    145   1    0.0%   0.0    0.2     0    0   1    1
    149   1    0.0%   0.0    0.2     0    0   1    1
    150   1    0.0%   0.0    0.2     0    0   1    1
    152   1    0.0%   0.0    0.2     0    0   1    1
    155   1    0.0%   0.0    0.2     0    0   1    1
    157   1    0.0%   0.0    0.2     0    0   1    1
    159   1    0.0%   0.0    0.2     0    0   1    1
    162   1    0.0%   0.0    0.2     0    0   1    1
    164   1    0.0%   0.0    0.2     0    0   1    1
    165   1    0.0%   0.0    0.2     0    0   1    1
Total:  4338         48.2   964     625  24  192  216

Column F total is the sum, because each of the lines in the tablequeries a different hostname, and thus they are separate cache groups.

Columns G, H, J are maximums, as all lines queries the same hostname,and thus they are part of the same cache group. (If you query all 9 TXTrecords for a connection from net 210, you will not query them again fora connection from another net because all 9 records are already in thelocal cache).

So in this realistic example, you generate 625+25+192=842 queries, whileI generate fewer than 216. My mask can be improved a lot without addingextra queries, and that would make my total of 216 be even lower. Thebetter the mask, the lower the query count. In order for your mask to bemore narrow, you have to publish more A records, as you have shown(GENERATE 1.$ example), which would generate even more queries. So ascompiler tries to make the record more efficient, in my mask case itincreases the mask from 61 bytes to longer, and decreases the number oflookups (probably drastically). In your case, the top record remains thesame length, as it changes from %{1ir} to %{2ir}, but now it requires aquery for every unique request from a class B network, which is bringsyour total for column F to 625*625, as each line in my table blows up to254 lines to accomodate all the class Bs. Maybe less that 625*625, butthe increase in query requirement still grows geometrically.

Also note the way the distribution tails. Because I don't have a verylarge volume, some of the nets I only once see every 3 months. Others Isee even less frequently. Because this distribution asymptotically tendsto zero, it means that if the volume of connections were higher, I'd seethat my host is accessible by SMTP from even more Class A nets.

If the distribution would have dropped sharply, such that I would haveseen no fewer than N>2 connections from any one Class A net, I couldhave _assumed_ that I have seen connections from all the possible ClassA networks that a connection could come from.

This conclusion about distribution works against your exists proposal,and in favour of my mask proposal, as you can see.

1 for the exists and 1 for the MX (if the entire MX list fits in the
additional portion of the MX response).  In any case, this gains back
some of the usefulness of the other mechanisms without having to
recompile (or test for needing to recompile) continually and without
forcing their complex evaluation in all instances.  The cache expire
time for the records used in exists should definitely be kept low.
Excellent! so let's look at a 24 hour period. Say that we get 2540connections per hour, 10 from each class A network. Let's assume a TTLof 24H for the MX, 1H for the exists records, and 1H for the TXT record.
Recall I explained why the exists records have the same TTL as the TXTrecord.
Total traffic with your method:
1*MX + 24*TXT  +  24*254*A = 6121 queries during the 24H period.
In total, you called ns_resolv 24*2540*3 times (182880 times). So thecache saved you traffic 96.6% of the time.



This is a very convenient calculation that makes my masking method using
exists look significantly worse.  I don't believe it actually needs to
be that bad.  You assume that all the queries in exists would need a
short TTL, or even the same TTL, and I initially agreed because of the
failover scenario.  One of the advantages of my method, even taking into
account your "I need a short TTL so I can fail over" scenario, is that
all the other SPF mechanisms are usable (as long that they don't cross
administrative boundaries where you don't know how things could change)
without having to recompile the record at all.

I have taken this into account in the analysis above. It was my honestmistake, and I apologize.

You should keep the TTL for any given A record used in exists low if you
plan on using an addresses in that class A as part of your failover
plan.  Fortunately, most of them won't be used.  If you're the kind of
person who is prepared for failover such that the TTL is a concern, you
already know where you are going to failover to (it may even be one of
the addresses that is already listed in the MX).  Say my MX is on 1/8
and my failover is at a different ISP (which is otherwise unlisted, not
even as a backup MX) on 2/8.  I have these records:

                       24h IN TXT "v=spf1 -exists:%{ir1}._spf.%{d}"
                                  " +mx -all"
                        1h IN MX 10 mailhost
$GENERATE 3-254 $._spf 24h IN  A 127.0.0.1
                2._spf  1h IN  A 127.0.0.1
              mailhost  1h IN  A 1.1.1.1


I would prefer to keep to our ebay example.

But let's look quickly at this new example you propose.

Both the "exists mask" and the "modifier mask" would be inserted by thecompiler, as it sees necessary to save traffic, right?

In that case, the compiler would never insert the exist mechanism above,since it only makes the matters worse.


It would compile that record simply to
                       1h IN TXT "v=spf1 ip4:1.1.1.1 -all"

This record would be queried at most 24 times in a day, and it preservesthe initial intent that the mailhost may be moved at 1-hour's notice.

Even if it used my mask modifiers, it would not add any, since all theaddresses are visible in the first TXT query, and there would be nosubsequent queries. No possible savings, so no need for any masking.

But since there's no possible savings, there's no need for masking ofany kind. That's why I wanted to stick to the ebay example, because evenwhen it's fully optimized it doesn't fit in 1 UDP packet, so it must bebreak-up into multiple queries. That's where the masks shine, inavoiding subsequent queries, when it is not possible to avoid them bycompiling everything into a list of IPs that would fit in a UDP packet.

That is, the TXT and 252 of the class A exists records are cachable for
24 hours, and the ones I need to change if I fail over (two As and the
MX) are 1 hour.  At 2540 connections per hour, 10 from each class A,
this design makes

        24*MX + 1*TXT + 1*252*A + 24*2A = 325 queries

calling ns_resolv 24*2540*3 (182880) times, with a cache hit percentage
of 99.82%

Correct. 325 queries vs. 24. But I believe you made an error in sayingthat any masking is necessary in this simple case. Once you take themasking away, there's nothing left to compare :)

(like we have previously, I'm again assuming the load of 1*MX includes
the lookup of the resultant As, thus it's fixed).

I have done the same, in the interest of comparing apples to apples. Ifwe're both right, then great, and if we're both wrong, our calculationswould be off by the same factor, so the comparison is still valid :)



 By taking our actual

current and failover network information into account, the number of
queries have been reduced by nearly 95% over that 24 hour period, and
the cache hit is significantly better.  And the TTLs that should be
longer can still be without significant hits to our failover plan.

Failover is a fascinating topic of itself, but for our purposes let'sjust ack that it exists. That is taken into account when we chose the 1hour TTLs vs. the 24 hour TTLs.

If I'm more correct about zombie distribution than you are, then the
largest term in the number of queries per day calculation, the 1*252*A,
might be significantly less because of the distribution of zombiable
computers being concentrated on popular class As.

I've shown above the distribution I really experience, not a theoreticaldistribution. Hopefully that will settle any claims of whichdistribution is more realistic.

I've included your original calculations for your method below for
reference.
With my method, mask included at the end of the top level TXT, total of9 records with the same TTL of 1H. The records are fully compiled andcontain only IP4 and redirects.
Total traffic with my method:
24*1*TXT = 24 queries, if the mask is top notch.
24*9*TXT = 216 queries, if the mask is useless.

More likely the actual number of queries is between 24 and 216.
In total, I called ns_resolv between 24*2540*1=60960 times if I the maskwas top notch and 24*2540*9=548640 times if the mask was crap. So thecache saved me traffic exactly 99.96% of the time, whether the mask wasgood or not.
As you can see, there's a huge difference, and most of it is owed to thefact that the exists are AGAU, even though 96.6% _looks_ like a prettyhigh cache efficiency.
Let's keep in mind that we are not comparing the same exact records, but
it shouldn't matter much.  If all of ebay's sending IPs can be encoded
in a single A record, you could substitue the lookup for that A record
for the +mx in my sample record.  It would still be the same load.

Well, in my analysis above, I reverted to using the same records, andonly comparing the masking scheme. All else being equal, I looked onlyat the differences between masking schemes. Comparing different recordsis a waste of our time.

As a comparison, and for the record, here are the numbers for the same
record without using any kind of masking:

                       24h IN TXT "v=spf1 +mx -all"
                        1h IN MX 10 mailhost
              mailhost  1h IN  A 1.1.1.1

That's 24*MX + 1*TXT = 25 queries, and calling ns_resolv 24*2540*2
(121920) times with a cache hit ratio of 99.9795%.  Note, again, I
didn't include the A record lookup for mailhost, because it wasn't

included in any of the other calculations.

Short records are not relevant for the masking discussion. If a compileris somewhere in the loop, this record would become:


                        1h IN TXT "v=spf1 ip4:1.1.1.1 -all"

This compiled record results in 24 queries, instead of the initial 25.

BUT!!!

In a every-day case where there is no virus, this 24 vs. 25 querycomparison is valid only if the MTA in question receives a lot of emailfrom the publishing domains (more than 1 per hour) and/or a lot offorgeries.

But in the case of an obscure little domain that doesn't get forgedmuch, and which doesn't sent out much email either (say it sends 1 mailper day to the SPF-checking MTA), the compiled record would generate 1query per day, while the uncompiled one would generate 2 queries perday. That's double the traffic. Of course 2 queries per day is notsomething to worry about, except that when the domain gets forged onceper day and the forged email gets sent to 20,000 MTAs per day, you nowhave 40,000 queries for the uncompiled record, but only 20,000 queriesfor the compiled record. That starts to become significant, especiallyif your-little-obscure-domain-name-requires-query-packets-that are quitebig.com, as your DNS provider may charge per Mbyte. We've seen someinstances where queries over a monthly limit were being charged $5/MB,IIRC.

But this is a discussion on the value of using a compiler. Let's make aseparate thread of this too, if you wish to continue.

I don't know about your mail reader, but in thunderbird, the subject ofthe emails is so far to the right in threaded view mode that it is offthe screen :)




> The remaining mechanisms

would have to be really expensive (in terms of number of queries and
query cachability) to make masking mean something.  The typical case
(legit mail) is made worse by planning to be able to handle the atypical
case (SPF-Doom attack!).  If the numbers you've been preaching are
correct, masking may be a good trade off for complex, amplifying
records.


I think my preaching is sound! ;)

But please do shut it down if you see holes in it. Otherwise, all thispreaching would be a waste of my and your time, and we all have otherthings to do too, I'm sure... like sleep, in your case ;)

I still think this should be evaluated on a case-by-case basis.  Masking
using exists or compiling and using a masking directive can make simple
records worse, especially if they would overflow into multiple records
because of include flattening.

That case of overflow is the only case where masking is useful, so thoseare the only cases we should use to evaluate the merits of the proposedmasking ideas.


If no compiler
   there is no masking, or you'd have to insert it manually,
   which is inconvenient and error prone.

elseif compiler is used
   If compiled with cron, or once in a while, -flatten should
   not be used. There will be left-over mechanisms whose
   resulting IP address may change (administrative gap)
   Thus, masks MUST not be added, since while they work initially
   they would break the record when the ISP changes their
   infrastructure.
   If compiled with cron, and -flatten not used, but the record
   compiles into a list of IPs anyway (ie, you list no mechanisms
   that lie outside your adminstrative boundary), then it may
   include masks if useful.

   Masks can only be reliably inserted when your record is
   completely in your administrative control, as above, or if
   the compiler runs as part of the DNS server.
   In that case, the record can be safely flattened *only if*
   the TTLs of all mechanisms are respected,
   including those across the domain boundary. In that case, the
   record, and implicitly the mask get regenerated every time
   the IP list gets regenerated beause of expired TTLs. So,
   the mask always reflects the current record.

   Also, there's an additional condition on inserting masks.
   A mask may only be inserted if all mechanisms that cannot be
   compiled into an ip list (those that use the %{l} or %{i} macros)
   are brought up into the top TXT record.
   In other words, a mask may only be inserted if the remaining
   mechanisms in subsequent redirects/includes contain only
   IP lists.
end if.

I'm going to sleep on this for a little while, and see how the exists:method can be better than the mask method.



Well, the most obvious way :) it is better is that it is implementable
as soon as yesterday, without having to change the spec, redeploy SPF
evaluators to make them mask-syntax aware, install stunt DNS servers or
upgrade DNS software.  In fact, if you are using bind9, the $GENERATE
construct allows easy and quick generation of the necessary class A
records without using an SPF record compiler or outside script.

Both mechanisms are implementable now, because my mask is a modifierthat is neither required to be added, nor required to be used bycheckers. It does not *require* any changes, unless you want to be ableto specify long but convenient SPF records that be compiled into IP

lists.

Existin SPF checkers would just ignore the mask modifier.

BTW, I've been referring to doing either of our methods as "masking".My suggestion uses exists to generate the mask, your's uses a new
mechanism (too bad it's order dependent, otherwise it could be a
modifier and thus deployed SPF evaluators would skip over it -- although
redirect= is order dependent, isn't it?)

No, my mask is a modifier, and is not order dependent. In fact, whenmasks are checked, all the masks should be compared, and only if *none*match the incoming IP, the evaluation can be aborted. If even one maskmatches the incoming IP, it means that that range is used later in therecord, so the additional queries must be done to find out if the IPmatches exactly.

One obvious way is if allforger traffic came from the same A class net all the time, _AND_ thespecific address was close enough to the servers that the mask wouldmiss it. [...] It's pretty unrealistic though, given all therestrictive ifs.
Baring some obscenely large hole on ALL networks, I think past patterns
suggest that those who are most vulnerable, and will remain vulnerable,
are those who sell consumer oriented services (because of the nature of
consumers to not really be security oriented, thus a target for
zombies).

Let's not forget the armies of employees who scour their mailbox firstthing Monday morning in search for jokes (including executable jokes) toget them past the Monday blues.


My spf-doom virus was coming from an employee of such a company.

But you are right, consumer services are also great candidates as targets.

I think overall, can can only really exclude the secured servers thatprovide no user access.

 If the mask being ineffective, however it is implemented, is
a concern, avoiding class As that are shared with home subscribers might
be wise.  BUT, using either your method or mine, you could implement
masks even more restrictive.  This could be as simple as, using my
method:

$GENERATE 2-254 $._spf   24h IN  A 127.0.0.1
$GENERATE 2-254 1.$._spf 24h IN  A 127.0.0.1

if you want to allow only the 1.1/16 range to be further evaluated.  But
at some point, my method becomes diminishing returns because the "normal
case" is not an attack but rather legit email, and things like that only
add to the number of queries performed, of course.  So you have to weigh
your chances of getting attacked (and having to deal with the increased
load) and what should be considered "normal operation".  It largely
depends on where the attacks are coming from.


So what would the corresponding SPF mask look like?

-exists:{1ir} -exists:{2ir} mx ?

So if you happen to be ebay.com, and you send from the 64-67 class Anetworks, you'd have to publish that expensive mask to avoidRoadRunner's cable modem users who are on nets 24, 65 and 66?

Please look at this specific case more closely. Ebay + RoadRunner make agreat study case, I think.

If there is a long lived, sustained attack, modifying the SPF record to
include masks may be a good short term solution (until the attacks
subside) as a way to control the load that your SPF records is putting
on receivers' and your own systems.

Masks can only be reliably inserted by a compiler. It's just notpractical to install a new DNS server that does compiles, and not eveninstall a cron job (your system may use the complicated LDAP + DNS + SQL+ YP alphabet soup) when the long-lived attack comes. And it takessome planning before the master,authoritative DNS server for a domainwill be screwed around with, especially if you are a DNS provider andlots of domains depend on your services.

I never meant for masks to be a reactive sh*t containment tool, but apre-emptive sh*t preventer tool ;)

Whatever we conclude, I really enjoy these thoughtful discussions.



Heh, incidentally, I'm starting to find them tedious, but overall
interesting -- after all, I'm up at 4am (I'm in central time)
responding, so that must mean something. :)

I appreciate your effort and thoughtfullness very much, but youshouldn't loose sleep over this.


I myself got up early to see if there were any messages :)


Regards,
Radu