Re: Re: DNS load research

Andy Bakun wrote:

Have we fully explored different weights for each mechanism based on
what kind of DNS load they exhibit?


Not by a longshot.

We are kind of at an stalemate here.  Radu wants everyone to have a
"zero load SPF record", which is actually a valid goal (and good
buzz-phrase).  Others don't want to suggest that people not use the
features of SPF that actually make it SPF because those features have
legit uses (zero load SPF records are just RMX in disguise).  There is a
middle ground.

I would say 'low load' or 'minimum load'. I understand that zero is notfeasible.

These numbers below are just an example.  I chose these weights based on
my understanding of how expensive each one is to DNS, how likely it is
that the MTA would do that anyway, and if the result will is logically
cachable; my understanding may be flawed.

        mechanism/modifier  |  weight
                all         |    0

a | 2mx | 1

                ptr         |    2
                ip4         |    0
                ip6         |    0
              include       |    1
              exists        |    3
              redirect      |    2

        The baseline is 2.  I've given mx 1 because the MTA needs to
        look this up anyway, so it's a lookup, but it's cheaper than
        other queries that the MTA might not need to do without SPF
        (although, many MTAs do a and ptr lookups, but that's not
        required to accept mail, and may be less required when SPF sees
        significant deployment).

This is not quite correct. When you get incoming spam, you have too lookup MX records for domains that otherwise would have no reason to lookup, as you don't correspond with them.

The MX is an indirect mech. Every time you see MX, be ready for at leasttwo queries. One to get the list of MX mailers, and at least one to getthe A record of the first mailer. Also, when you see MX, you have noidea how many lookups it will take to get to the bottom of it. You haveto do one query to find out.

For these reasons, the MX mechanism is at least twice as expensive as anA mechanism.


Also, MX mechanisms really are worthless, but expensive:

When you list an MX mechanism, there can be two possible scenarios:

1. You control that MX mechanism.
   So you know all the mailers, and you should list them (by IP :) )

2. You don't control it, it's in someone else's domain.
   So you don't know the mailers, and you're guessing.
   You've a better chance of guessing wrong than right, as in many
   installations, outgoing mail goes through different servers than
   incoming (which is what MX is for).
   We used t-online.de as an example of this guess-work.

I'm not sure what the total allowed weight should be before returning
PermError, but I don't see any problem with using Wayne's current
values.  Again, larger limits make the hard things (complex setups)
possible.  But people need to realize that their setup is complex, as a
way to drive change.  Weighing the SPF record like this could be a step
in that direction.

I believe it's not the mechanisms themselves that have the highest loadpotential. We'd have to look at what is cacheable and what isn't to geta picture of where the load really is.

In the following, I will keep in mind a scenario with 50 zombies, eachsending me mail forged to show 200 different users (randomly generatednames, like sldkjsfoiu(_at_)yahoo(_dot_)com) @ the same set of 50 different domains(4 random users per domain). (Ie, the zombies have the same mailing listto send to). Assume that each of the 50 domains uses a mechanism likeshown below; Assume the the random algorithm is the same on all zombies,and it generates the same sequence of 200 unique usernames.

This scenario is one I see daily in my server logs. Fortunately noteveryone is publishing SPF yet. If/when everyone does publish SPF, thecosts below become much closer to reality than they are today:

Mechanisms like A:domain.com will cost 1 query across the internet, and49 hits to the cache. Grand total: 50 queries to the internet,50*(50*200 - 1) to the cache.

A mechanism that uses the %{d} (domain name) macro will be easilycacheable, so that will cost also 1 query on the internet and 49 to thecache. Grand total : 50 queries to the internet, 50*(50*200 - 1) to thecache.

A mech that uses %{i} (IP address) will not be so easily cacheable, andthe grand total cost = number of domains * number of IPs (50*50) queriesto the internet, and 50*50*(200-1) to the cache.

A mech that uses %{l} (user name of sender) will also not be easilycacheable, and will result in 50*200 queries to the internet, and50*200*(50-1) queries to the cache.

A mech that uses both the %{i} and the %{l} or %{s} mechanisms (likealtavista.com - +exists:CL.%{i}.FR.%{s}.HE.%{h}.null.spf.altavista.com)will cost a full 50*50*200 queries to the internet. Pray that thezombies don't forge altavista, or you'll get aquainted with the wrath ofSPF.

I have not mentioned this before, because these macros are the "complexsetups" that make hard things possible. This was one of the fundamentalreasons for SPF's existance, so I'll let it be.

The DNS limit however acts as an amplifier to the costs of the macros.So it should be kept as low as possible.

If the goal is to have everyone publish SPF, we must deal with thatscenario. How much will it cost our DNS infrastructure if everyonepublished SPF ? Currently only a small percentage do, and looking at thetraffic numbers I don't like where they are headed.

I think you ask a great question, but to answer it well it would requirea lot more research and thought.


Regards,
Radu.




-------
Sender Policy Framework: http://spf.pobox.com/
Archives at http://archives.listbox.com/spf-discuss/current/
Read the whitepaper!  http://spf.pobox.com/whitepaper.pdf

To unsubscribe, change your address, or temporarily deactivate your subscription,please go to http://v2.listbox.com/member/?listname=spf-discuss(_at_)v2(_dot_)listbox(_dot_)com

radu.vcf
Description: Vcard