spf-discuss
[Top] [All Lists]

Re: short circuiting evaluation

2005-03-25 11:38:56
Andy Bakun wrote:
This TTL discussion on the A records that exists uses is academic, as
I've changed my position on it absolutely needing to be low.

I think changing the TTL based on load is extremely dangerous. I assume you mean that the TTL is increased when the load increases.


The TTL I was suggesting for these exists: records is either around zero
or significantly smaller than you'd normally set the TTL to if you
wanted to avoid significant downtime, such that even an increase like 2x
or 3x is still less than 5 or 10 minutes.

If you're getting hammered with queries at x per second with a TTL of 10
seconds, then your load would be x/2 if you increase the TTL to 20
seconds.  This isn't significantly different, but when there's an attack
going on, this would reduce your load without significantly hindering
your ability to fail over.


So if something fails under the heavier load and you have to relocate it, you'll suffer longer downtime because the TTLs are longer.


There's also nothing keeping it from going the other way -- set the TTL
to 1 hour in the normal case.  If it looks like your load is up because
your domain is being forged, decrease it so that in the case you do need
to fail over, your downtime window is decreased.  That is, if you've
been serving TTLs of 1hour, and you change to 30 minutes, then cache
entries will expire in an average of 45 minutes, thereby reducing the
length your downtime/inaccessibility due to caching.

The instant you publish a 30-minute record, all the records you've issues this far have been 1-hour. So all records out there have 1-hour expirations. Changing the published TTL does not instantly change the average TTL out there, but it would take 1 hour for all the previously issued to expire and be refreshed with records of the new limit.

Incidentally, if you have a spread of 1-hour TTLs out in the field, their average time-to-refresh, or remaining TTL is 30 minutes. If you have a spread of 30-minute TTLs, the average time-to-refresh is 15 minutes. So if you do what you said, the average TTL out there would drop from 30-minutes to 15-minutes over the course of 1-hour.


I'll think about whether it would actually help in this scenario, but for a typical case, say the TTL of a web-server address, it's definately a bad idea to increase the TTL as you're pushing the equiment closer to failure.


Sure, except we are specifically not talking about "typical cases" here,
and especially not web-servers.  If I send email and your server is
overloaded, I may get a DSN from my server saying it temporarily can't
connect, but assuming your load comes down in a reasonable amount of
time, the mail will go through with no action on any one else's part. Web servers are really different, in that if your retail website isn't
responding, people will immediately go to your competitors.

In the "typical case", email goes through.  The atypical case is the
mythical SPF-doom virus that is pounding on mail servers causing mail
servers to pound on DNS through SPF.  I thought we were trying to
optimize for the atypical case here.

Slight correction: The virus _taps_ on MTAs, and MTAs _pound_ on DNS. ;)

You're right, the mail situation is somewhat different. While I consider a 4-hour delay in email unnacceptable, ymmv. 4-hours is the default time after which sendmail warns that it hasn't been able to deliver. But when it gives the warning it still means it's unknown how much longer it'll have to try.

But let's not concentrate on that since it takes energy away from the real discussion. We can start a separate thread for fail-over if you like.

Ok, but let's look at a higher connection rate for a second.

If you get 255 connections from around the world, 1 from each A class net, you have to do 1 query (TXT) + 243 A (for those which are in different class A nets than ebay's servers) + 8 queries for those that are in the same class A nets as ebay.


This is a fine thought experiment, but most likely not that realistic. Chances are, most, if not all, zombied machines are going to come from
some small (and maybe even predictable) set of class A addresses during
any given single attack.  It seems most of the single digit class As are
out immediately, for example.  I'd think large attacks are going to come
from IP blocks that are hosting connectivity services sold to the
public/consumers.

Very well. I used my 3-month long maillog as a research resource again, and I found that my mail server's port 25 is accessible from most corners of the world. Below I show the distribution of incoming connections from different class A hosts.

My host sees very modest volumes of SMTP activity. Thus it checks very few SPF records. A more central mail server would see far more connections. Since my total number of connections per day is about 48.

I will try to estimate what a more central mail server might see, and thus I multiplied the number of connections I see by a factor of 20 (still very modest). Hotmail bounces 2-billion spams a day. A large site like that would have to do even more exists queries than my setup*20. They may get much closer to the worst case I described before. But let's see how close would a very modest _realistic_ case get.

Anyway, I have applied the distribution I saw at my server, but used 20x the volume, and I will compare how many queries your exists method might require, vs. how many the mask method might require:

Please note that multiplying the number of connections does not change the shape of the distribution. For A nets that I saw no connections froms, the multiplication will not create any connections. However, for nets where I get a small number of daily connects on average, that daily average will be increased by 20x (ex. From net 212 I see about 0.3 connections per day; after the volume increase, I would see about 6 connections per day - the table rounds 6.7 to 7 for net 212)

Column A is the Class A net.
Column B is the total number of connections from an IP in that net that I have seen in the last 3 months
Column C is the percentage out of total number of connects.
Column D is the average daily number of connects from that net (col B/90)

Column E is the estimated number of connects for a more central site (ohmi's average daily connects * 20). I will refer to this as connections_per_24H.

For the exists column (F), I have used the following formula:

    if (connections_per_24H > 24) queries = 24;
    else queries = connections_per_24H

Column G is the number of MX queries you'd have to do. I have assumed that we keep to the ebay example. In that context, your published TXT records would be "v=spf1 -exists:%{ir1}._spf.%{d} +... the long list of IPs, spread over the same number of _s extensions as my example" Perhaps since my mask takes 61 bytes, and your mech takes 24 bytes, in your case the ebay record may be shorter by 1 TXT query.

Column H is the number of TXT queries that you'd have to do with the exists method (=1+7*column_G). I gave you the benefit of the doubt and assumed that your functionally identical record to mine would be 8 TXT records instead of 9 because of the shorter mask string.

For the column J I used the number of TXT queries that my previous ebay record would generate (=1+8*column_G), if we used the mask method, with a mask equivalent to your "exist mask":

-m=64/6  m=80 m=194 m=203 m=206 m=209 m=210 m=212 m=216 m=220

I have already pointed out that this mask is very poor. A better mask would have higher CIDRs and therefore be more narrow. But I want to compare apples to apples. The information published by the DNS server is the same in both cases, except it is published in different ways. In my way, it is published as the above 61 string appended to the the top TXT record at ebay.com, and in your case, it would be published with the 254-13=243 A records generated by the compiler. Actually in both cases, the information is generated by the compiler.

Since my mask can be made narrower without adding extra queries, the numbers in column J should read "<=", ie at _at_most_ 9 queries. Nonetheless, the total for the column G is based on 9 queries. It looks like in the past 3 months I got no connects from one of the nets that ebay uses, otherwise the total would have been 9*24

    A      B     C     D       E      F   G   H    J
    65    849  19.6%  9.4    188.7   24   24  192  216
    66    562  13.0%  6.2    124.9   24   24  192  216
    61    208  4.8%   2.3    46.2    24   0   1    1
    24    177  4.1%   2.0    39.3    24   0   1    1
    211   177  4.1%   2.0    39.3    24   0   1    1
    218   153  3.5%   1.7    34.0    24   0   1    1
    200   130  3.0%   1.4    28.9    24   0   1    1
    69    112  2.6%   1.2    24.9    24   0   1    1
    220   110  2.5%   1.2    24.4    24   0   1    1
    82    108  2.5%   1.2    24.0    24   0   1    1
    202   101  2.3%   1.1    22.4    22   0   1    1
    222   97   2.2%   1.1    21.6    22   0   1    1
    68    89   2.1%   1.0    19.8    20   0   1    1
    221   85   2.0%   0.9    18.9    19   0   1    1
    81    74   1.7%   0.8    16.4    16   0   1    1
    207   69   1.6%   0.8    15.3    15   0   1    1
    210   66   1.5%   0.7    14.7    15   15  120  135
    132   63   1.5%   0.7    14.0    14   0   1    1
    219   59   1.4%   0.7    13.1    13   0   1    1
    38    57   1.3%   0.6    12.7    13   0   1    1
    67    56   1.3%   0.6    12.4    12   12  96   108
    80    56   1.3%   0.6    12.4    12   12  96   108
    213   56   1.3%   0.6    12.4    12   0   1    1
    83    54   1.2%   0.6    12.0    12   0   1    1
    4     51   1.2%   0.6    11.3    11   0   1    1
    62    49   1.1%   0.5    10.9    11   0   1    1
    217   48   1.1%   0.5    10.7    11   0   1    1
    64    44   1.0%   0.5    9.8     10   10  80   90
    84    43   1.0%   0.5    9.6     10   0   1    1
    203   42   1.0%   0.5    9.3     9    9   72   81
    216   38   0.9%   0.4    8.4     8    8   64   72
    201   36   0.8%   0.4    8.0     8    0   1    1
    206   36   0.8%   0.4    8.0     8    8   64   72
    209   33   0.8%   0.4    7.3     7    7   56   63
    192   32   0.7%   0.4    7.1     7    0   1    1
    194   31   0.7%   0.3    6.9     7    7   56   63
    212   30   0.7%   0.3    6.7     7    7   56   63
    85    22   0.5%   0.2    4.9     5    0   1    1
    60    21   0.5%   0.2    4.7     5    0   1    1
    12    20   0.5%   0.2    4.4     4    0   1    1
    70    19   0.4%   0.2    4.2     4    0   1    1
    168   17   0.4%   0.2    3.8     4    0   1    1
    63    15   0.3%   0.2    3.3     3    0   1    1
    204   13   0.3%   0.1    2.9     3    0   1    1
    205   13   0.3%   0.1    2.9     3    0   1    1
    59    11   0.3%   0.1    2.4     2    0   1    1
    195   10   0.2%   0.1    2.2     2    0   1    1
    129   8    0.2%   0.1    1.8     2    0   1    1
    198   8    0.2%   0.1    1.8     2    0   1    1
    193   7    0.2%   0.1    1.6     2    0   1    1
    151   6    0.1%   0.1    1.3     1    0   1    1
    163   6    0.1%   0.1    1.3     1    0   1    1
    131   5    0.1%   0.1    1.1     1    0   1    1
    144   5    0.1%   0.1    1.1     1    0   1    1
    208   5    0.1%   0.1    1.1     1    0   1    1
    130   4    0.1%   0.0    0.9     1    0   1    1
    141   4    0.1%   0.0    0.9     1    0   1    1
    71    3    0.1%   0.0    0.7     1    0   1    1
    136   3    0.1%   0.0    0.7     1    0   1    1
    148   3    0.1%   0.0    0.7     1    0   1    1
    161   3    0.1%   0.0    0.7     1    0   1    1
    166   3    0.1%   0.0    0.7     1    0   1    1
    196   3    0.1%   0.0    0.7     1    0   1    1
    128   2    0.0%   0.0    0.4     0    0   1    1
    138   2    0.0%   0.0    0.4     0    0   1    1
    167   2    0.0%   0.0    0.4     0    0   1    1
    43    1    0.0%   0.0    0.2     0    0   1    1
    58    1    0.0%   0.0    0.2     0    0   1    1
    133   1    0.0%   0.0    0.2     0    0   1    1
    142   1    0.0%   0.0    0.2     0    0   1    1
    145   1    0.0%   0.0    0.2     0    0   1    1
    149   1    0.0%   0.0    0.2     0    0   1    1
    150   1    0.0%   0.0    0.2     0    0   1    1
    152   1    0.0%   0.0    0.2     0    0   1    1
    155   1    0.0%   0.0    0.2     0    0   1    1
    157   1    0.0%   0.0    0.2     0    0   1    1
    159   1    0.0%   0.0    0.2     0    0   1    1
    162   1    0.0%   0.0    0.2     0    0   1    1
    164   1    0.0%   0.0    0.2     0    0   1    1
    165   1    0.0%   0.0    0.2     0    0   1    1
Total:  4338         48.2   964     625  24  192  216

Column F total is the sum, because each of the lines in the table queries a different hostname, and thus they are separate cache groups.

Columns G, H, J are maximums, as all lines queries the same hostname, and thus they are part of the same cache group. (If you query all 9 TXT records for a connection from net 210, you will not query them again for a connection from another net because all 9 records are already in the local cache).

So in this realistic example, you generate 625+25+192=842 queries, while I generate fewer than 216. My mask can be improved a lot without adding extra queries, and that would make my total of 216 be even lower. The better the mask, the lower the query count. In order for your mask to be more narrow, you have to publish more A records, as you have shown (GENERATE 1.$ example), which would generate even more queries. So as compiler tries to make the record more efficient, in my mask case it increases the mask from 61 bytes to longer, and decreases the number of lookups (probably drastically). In your case, the top record remains the same length, as it changes from %{1ir} to %{2ir}, but now it requires a query for every unique request from a class B network, which is brings your total for column F to 625*625, as each line in my table blows up to 254 lines to accomodate all the class Bs. Maybe less that 625*625, but the increase in query requirement still grows geometrically.

Also note the way the distribution tails. Because I don't have a very large volume, some of the nets I only once see every 3 months. Others I see even less frequently. Because this distribution asymptotically tends to zero, it means that if the volume of connections were higher, I'd see that my host is accessible by SMTP from even more Class A nets.

If the distribution would have dropped sharply, such that I would have seen no fewer than N>2 connections from any one Class A net, I could have _assumed_ that I have seen connections from all the possible Class A networks that a connection could come from.

This conclusion about distribution works against your exists proposal, and in favour of my mask proposal, as you can see.

1 for the exists and 1 for the MX (if the entire MX list fits in the
additional portion of the MX response).  In any case, this gains back
some of the usefulness of the other mechanisms without having to
recompile (or test for needing to recompile) continually and without
forcing their complex evaluation in all instances.  The cache expire
time for the records used in exists should definitely be kept low.

Excellent! so let's look at a 24 hour period. Say that we get 2540 connections per hour, 10 from each class A network. Let's assume a TTL of 24H for the MX, 1H for the exists records, and 1H for the TXT record.

Recall I explained why the exists records have the same TTL as the TXT record.

Total traffic with your method:
1*MX + 24*TXT  +  24*254*A = 6121 queries during the 24H period.

In total, you called ns_resolv 24*2540*3 times (182880 times). So the cache saved you traffic 96.6% of the time.


This is a very convenient calculation that makes my masking method using
exists look significantly worse.  I don't believe it actually needs to
be that bad.  You assume that all the queries in exists would need a
short TTL, or even the same TTL, and I initially agreed because of the
failover scenario.  One of the advantages of my method, even taking into
account your "I need a short TTL so I can fail over" scenario, is that
all the other SPF mechanisms are usable (as long that they don't cross
administrative boundaries where you don't know how things could change)
without having to recompile the record at all.

I have taken this into account in the analysis above. It was my honest mistake, and I apologize.

You should keep the TTL for any given A record used in exists low if you
plan on using an addresses in that class A as part of your failover
plan.  Fortunately, most of them won't be used.  If you're the kind of
person who is prepared for failover such that the TTL is a concern, you
already know where you are going to failover to (it may even be one of
the addresses that is already listed in the MX).  Say my MX is on 1/8
and my failover is at a different ISP (which is otherwise unlisted, not
even as a backup MX) on 2/8.  I have these records:

                       24h IN TXT "v=spf1 -exists:%{ir1}._spf.%{d}"
                                  " +mx -all"
                        1h IN MX 10 mailhost
$GENERATE 3-254 $._spf 24h IN  A 127.0.0.1
                2._spf  1h IN  A 127.0.0.1
              mailhost  1h IN  A 1.1.1.1

I would prefer to keep to our ebay example.

But let's look quickly at this new example you propose.

Both the "exists mask" and the "modifier mask" would be inserted by the compiler, as it sees necessary to save traffic, right?

In that case, the compiler would never insert the exist mechanism above, since it only makes the matters worse.

It would compile that record simply to
                       1h IN TXT "v=spf1 ip4:1.1.1.1 -all"

This record would be queried at most 24 times in a day, and it preserves the initial intent that the mailhost may be moved at 1-hour's notice.

Even if it used my mask modifiers, it would not add any, since all the addresses are visible in the first TXT query, and there would be no subsequent queries. No possible savings, so no need for any masking.

But since there's no possible savings, there's no need for masking of any kind. That's why I wanted to stick to the ebay example, because even when it's fully optimized it doesn't fit in 1 UDP packet, so it must be break-up into multiple queries. That's where the masks shine, in avoiding subsequent queries, when it is not possible to avoid them by compiling everything into a list of IPs that would fit in a UDP packet.


That is, the TXT and 252 of the class A exists records are cachable for
24 hours, and the ones I need to change if I fail over (two As and the
MX) are 1 hour.  At 2540 connections per hour, 10 from each class A,
this design makes

        24*MX + 1*TXT + 1*252*A + 24*2A = 325 queries

calling ns_resolv 24*2540*3 (182880) times, with a cache hit percentage
of 99.82%

Correct. 325 queries vs. 24. But I believe you made an error in saying that any masking is necessary in this simple case. Once you take the masking away, there's nothing left to compare :)

(like we have previously, I'm again assuming the load of 1*MX includes
the lookup of the resultant As, thus it's fixed).

I have done the same, in the interest of comparing apples to apples. If we're both right, then great, and if we're both wrong, our calculations would be off by the same factor, so the comparison is still valid :)


 By taking our actual
current and failover network information into account, the number of
queries have been reduced by nearly 95% over that 24 hour period, and
the cache hit is significantly better.  And the TTLs that should be
longer can still be without significant hits to our failover plan.

Failover is a fascinating topic of itself, but for our purposes let's just ack that it exists. That is taken into account when we chose the 1 hour TTLs vs. the 24 hour TTLs.

If I'm more correct about zombie distribution than you are, then the
largest term in the number of queries per day calculation, the 1*252*A,
might be significantly less because of the distribution of zombiable
computers being concentrated on popular class As.

I've shown above the distribution I really experience, not a theoretical distribution. Hopefully that will settle any claims of which distribution is more realistic.

I've included your original calculations for your method below for
reference.


With my method, mask included at the end of the top level TXT, total of 9 records with the same TTL of 1H. The records are fully compiled and contain only IP4 and redirects.

Total traffic with my method:
24*1*TXT = 24 queries, if the mask is top notch.
24*9*TXT = 216 queries, if the mask is useless.

More likely the actual number of queries is between 24 and 216.

In total, I called ns_resolv between 24*2540*1=60960 times if I the mask was top notch and 24*2540*9=548640 times if the mask was crap. So the cache saved me traffic exactly 99.96% of the time, whether the mask was good or not.

As you can see, there's a huge difference, and most of it is owed to the fact that the exists are AGAU, even though 96.6% _looks_ like a pretty high cache efficiency.


Let's keep in mind that we are not comparing the same exact records, but
it shouldn't matter much.  If all of ebay's sending IPs can be encoded
in a single A record, you could substitue the lookup for that A record
for the +mx in my sample record.  It would still be the same load.

Well, in my analysis above, I reverted to using the same records, and only comparing the masking scheme. All else being equal, I looked only at the differences between masking schemes. Comparing different records is a waste of our time.

As a comparison, and for the record, here are the numbers for the same
record without using any kind of masking:

                       24h IN TXT "v=spf1 +mx -all"
                        1h IN MX 10 mailhost
              mailhost  1h IN  A 1.1.1.1

That's 24*MX + 1*TXT = 25 queries, and calling ns_resolv 24*2540*2
(121920) times with a cache hit ratio of 99.9795%.  Note, again, I
didn't include the A record lookup for mailhost, because it wasn't
included in any of the other calculations.

Short records are not relevant for the masking discussion. If a compiler is somewhere in the loop, this record would become:

                        1h IN TXT "v=spf1 ip4:1.1.1.1 -all"

This compiled record results in 24 queries, instead of the initial 25.

BUT!!!

In a every-day case where there is no virus, this 24 vs. 25 query comparison is valid only if the MTA in question receives a lot of email from the publishing domains (more than 1 per hour) and/or a lot of forgeries.

But in the case of an obscure little domain that doesn't get forged much, and which doesn't sent out much email either (say it sends 1 mail per day to the SPF-checking MTA), the compiled record would generate 1 query per day, while the uncompiled one would generate 2 queries per day. That's double the traffic. Of course 2 queries per day is not something to worry about, except that when the domain gets forged once per day and the forged email gets sent to 20,000 MTAs per day, you now have 40,000 queries for the uncompiled record, but only 20,000 queries for the compiled record. That starts to become significant, especially if your-little-obscure-domain-name-requires-query-packets-that are quite big.com, as your DNS provider may charge per Mbyte. We've seen some instances where queries over a monthly limit were being charged $5/MB, IIRC.

But this is a discussion on the value of using a compiler. Let's make a separate thread of this too, if you wish to continue.

I don't know about your mail reader, but in thunderbird, the subject of the emails is so far to the right in threaded view mode that it is off the screen :)



> The remaining mechanisms
would have to be really expensive (in terms of number of queries and
query cachability) to make masking mean something.  The typical case
(legit mail) is made worse by planning to be able to handle the atypical
case (SPF-Doom attack!).  If the numbers you've been preaching are
correct, masking may be a good trade off for complex, amplifying
records.

I think my preaching is sound! ;)
But please do shut it down if you see holes in it. Otherwise, all this preaching would be a waste of my and your time, and we all have other things to do too, I'm sure... like sleep, in your case ;)

I still think this should be evaluated on a case-by-case basis.  Masking
using exists or compiling and using a masking directive can make simple
records worse, especially if they would overflow into multiple records
because of include flattening.

That case of overflow is the only case where masking is useful, so those are the only cases we should use to evaluate the merits of the proposed masking ideas.

If no compiler
   there is no masking, or you'd have to insert it manually,
   which is inconvenient and error prone.

elseif compiler is used
   If compiled with cron, or once in a while, -flatten should
   not be used. There will be left-over mechanisms whose
   resulting IP address may change (administrative gap)
   Thus, masks MUST not be added, since while they work initially
   they would break the record when the ISP changes their
   infrastructure.
   If compiled with cron, and -flatten not used, but the record
   compiles into a list of IPs anyway (ie, you list no mechanisms
   that lie outside your adminstrative boundary), then it may
   include masks if useful.

   Masks can only be reliably inserted when your record is
   completely in your administrative control, as above, or if
   the compiler runs as part of the DNS server.
   In that case, the record can be safely flattened *only if*
   the TTLs of all mechanisms are respected,
   including those across the domain boundary. In that case, the
   record, and implicitly the mask get regenerated every time
   the IP list gets regenerated beause of expired TTLs. So,
   the mask always reflects the current record.

   Also, there's an additional condition on inserting masks.
   A mask may only be inserted if all mechanisms that cannot be
   compiled into an ip list (those that use the %{l} or %{i} macros)
   are brought up into the top TXT record.
   In other words, a mask may only be inserted if the remaining
   mechanisms in subsequent redirects/includes contain only
   IP lists.
end if.


I'm going to sleep on this for a little while, and see how the exists: method can be better than the mask method.


Well, the most obvious way :) it is better is that it is implementable
as soon as yesterday, without having to change the spec, redeploy SPF
evaluators to make them mask-syntax aware, install stunt DNS servers or
upgrade DNS software.  In fact, if you are using bind9, the $GENERATE
construct allows easy and quick generation of the necessary class A
records without using an SPF record compiler or outside script.

Both mechanisms are implementable now, because my mask is a modifier that is neither required to be added, nor required to be used by checkers. It does not *require* any changes, unless you want to be able to specify long but convenient SPF records that be compiled into IP
lists.

Existin SPF checkers would just ignore the mask modifier.

BTW, I've been referring to doing either of our methods as "masking". My suggestion uses exists to generate the mask, your's uses a new
mechanism (too bad it's order dependent, otherwise it could be a
modifier and thus deployed SPF evaluators would skip over it -- although
redirect= is order dependent, isn't it?)

No, my mask is a modifier, and is not order dependent. In fact, when masks are checked, all the masks should be compared, and only if *none* match the incoming IP, the evaluation can be aborted. If even one mask matches the incoming IP, it means that that range is used later in the record, so the additional queries must be done to find out if the IP matches exactly.

One obvious way is if all forger traffic came from the same A class net all the time, _AND_ the specific address was close enough to the servers that the mask would miss it. [...] It's pretty unrealistic though, given all the restrictive ifs.


Baring some obscenely large hole on ALL networks, I think past patterns
suggest that those who are most vulnerable, and will remain vulnerable,
are those who sell consumer oriented services (because of the nature of
consumers to not really be security oriented, thus a target for
zombies).

Let's not forget the armies of employees who scour their mailbox first thing Monday morning in search for jokes (including executable jokes) to get them past the Monday blues.

My spf-doom virus was coming from an employee of such a company.

But you are right, consumer services are also great candidates as targets.

I think overall, can can only really exclude the secured servers that provide no user access.

 If the mask being ineffective, however it is implemented, is
a concern, avoiding class As that are shared with home subscribers might
be wise.  BUT, using either your method or mine, you could implement
masks even more restrictive.  This could be as simple as, using my
method:

$GENERATE 2-254 $._spf   24h IN  A 127.0.0.1
$GENERATE 2-254 1.$._spf 24h IN  A 127.0.0.1

if you want to allow only the 1.1/16 range to be further evaluated.  But
at some point, my method becomes diminishing returns because the "normal
case" is not an attack but rather legit email, and things like that only
add to the number of queries performed, of course.  So you have to weigh
your chances of getting attacked (and having to deal with the increased
load) and what should be considered "normal operation".  It largely
depends on where the attacks are coming from.

So what would the corresponding SPF mask look like?

-exists:{1ir} -exists:{2ir} mx ?

So if you happen to be ebay.com, and you send from the 64-67 class A networks, you'd have to publish that expensive mask to avoid RoadRunner's cable modem users who are on nets 24, 65 and 66?

Please look at this specific case more closely. Ebay + RoadRunner make a great study case, I think.

If there is a long lived, sustained attack, modifying the SPF record to
include masks may be a good short term solution (until the attacks
subside) as a way to control the load that your SPF records is putting
on receivers' and your own systems.

Masks can only be reliably inserted by a compiler. It's just not practical to install a new DNS server that does compiles, and not even install a cron job (your system may use the complicated LDAP + DNS + SQL + YP alphabet soup) when the long-lived attack comes. And it takes some planning before the master,authoritative DNS server for a domain will be screwed around with, especially if you are a DNS provider and lots of domains depend on your services.

I never meant for masks to be a reactive sh*t containment tool, but a pre-emptive sh*t preventer tool ;)


Whatever we conclude, I really enjoy these thoughtful discussions.


Heh, incidentally, I'm starting to find them tedious, but overall
interesting -- after all, I'm up at 4am (I'm in central time)
responding, so that must mean something. :)

I appreciate your effort and thoughtfullness very much, but you shouldn't loose sleep over this.

I myself got up early to see if there were any messages :)


Regards,
Radu


<Prev in Thread] Current Thread [Next in Thread>