Re: Modifications to SPF for Mask function

Hello, Chris,

Thank you for your comments. I will address them individually below.Point 1 remains unsolved, while that others are shown to be non-issues.



Chris Haynes wrote:

I'm sure I'm not alone in watching the current activity on a Mask function with
interest. I'm no SPF expert, but I've tried to understand SPF, and its
interactions with Sender-ID, over the last 8 months or so. There is a _huge_
amount of history here, and on MXCOMP, so I appreciate that it is probably
difficult to read all the archives and spot the critical bits of history.  The
spf drafts, however, are easily available and would repay careful attention - to
fact and to spirit.

You're right. I tried several times to make head and tails of it, butwithout much luck. I wish there was an executive summary of it all.

Rather than comment point-by-point over the last few days, I thought I'd store
up some comments / observations for when you (whoever you are) seem to have a
stable proposition and to be about to draft spec. changes.

Thank you. I have a few more general modifications to bring to thedraft, but I wanted to allow in depth discussion on the most importantones, and not distract anyone's focus from the other, less importantproposals. I think we are almost ready to proceed, and I will initiatethe process of finalizing all the proposed changes early in the week.

1) Distinguishing the new record from Sender-ID records: There was an optimistic
suggestion that Microsoft should be told to change their identification to make
way for this new SPF version. This would not happen, but it is not necessary.
SPF has the policy prefix form v=spf1. Sender-ID uses both a different syntax
and a different version number - so the two are distinguishable. All you would
have to do is get agreement within the SPF community on the version number to
use for the version with the mask mechanism in it and it would not interfere
with Sender-ID.


There was no talk of version numbers, as they do not need to change.

The suggestion was that the PRA mechanism get its own dedicate hostnameprefix (_{whatever}.domain.com) if a spf1 policy is present.

The reason is technical: the DNS reply packet has a limited amount ofpayload capacity. This may be very small, in the case of numerousauthoritative records. Combined with the load sharing features of (most)DNS server software, it would mean that if the ammount of TXT data foundfor a host name exceeds the UDP packet size, the DNS software may doload-balancing, and omit one of th TXT records, at random.

You can see how this hurts both SPF1 and SenderID equally, and this iscounter productive.

Since an MTA system that implements PRA would likely (necessarily)implement SPF too, and since the PRA evaluation depends on a successfulSPF evaluation, it is easy to see that ideally, PRA should allow SPF touse the DNS packet space as efficiently as possible. To get to the PRAcheck, the MTA has to expand the SPF packet completely. SPF is the frontline if you wish. So, would it not make sense that PRA leave as much ofthe DNS packet space to SPF, in order to minimize the queries that SPFmust do? Whether SPF requires one more query to resolve (for fetching arecord spread over more _spf{number} extensions due to less spaceavailable in the first packet, or PRA having to do another query whenthe SPF expanded succesfully are not the same thing. The extra querydone by SPF is more expensive than the extra query done by PRA, becauseSPF is on the front line, and the number of evaluations required by PRAis (much) smaller than the number of evaluations done by SPF. Thus, a(much) lower chance that the PRA query is even executed.

There will be those that will say that 1 more query doesn't matter whenyou are already doing a bunch.

But, keeping in mind that currently most (approx 84%) existing SPFrecords require fewer than 7 mechanisms, and that sharing the recordspace with PRA at the domain.com allows each of SPF and PRA to use about200-bytes of that packet:

When most records out there are compiled, because they happen to behosted on a compiling DNS service, and this will happen eventually, ifSPF and PRA achieve success (note the success of SPF will prompt DNSservices to optimize their costs by installing compilers, not the otherway around), many of those 84% of the records will compile into IP-listrecords that are longer than 200-bytes.

That means that when the SPF and PRA compilers use the available spaceof 200 bytes each, they will each have to expand their output such thatthey require 3 queries total for a SPF+PRA combination that used torequire 2 queries before the compiler was added. This puts "the brakes"on installing compilers. In turn it puts the brakes on saving DNStraffic. In turn it puts the brakes on SPF and PRA. To some degree, Iconcur.

Whereas if we separated them now in the backwards compatible way Ishowed, the 84% would require 2 queries after compile, instead of 3queries. When you're talking small numbers like that, a 50% unrealizedsavings does not look very good.

I understand that you're going from 7 queries to 3 queries, but youcould go from 7 to 2, for essentially effort other than the willignessto look far enough into the future and plan for it. Since we pretend tobe so much better than Microsoft, why don't we suggest this, and perhapsearn some (more) of their respect in the process. Please let's not makethis a discussion of SPF vs. Microsoft. Let's keep it at doing what'sbest for the future. If we cannot agree on what's best for the future,we should drop the issue and live with the inefficiency. Politics hasnearly killed SPF once, why try that again? Thank you kindly.

Separating the two record types to their own 'hostnames' is a way forthe two standards to coexist and cooperate in lowering the overall costof the solution.

2) If I understood the Mask proposal correctly, one of its benefits was claimed
to be that it would indicate the kind of non-pass (?, ~ or -) that would be
found if a sequence of 'includes' were followed - thus saving the need to
resolve all the includes to find out what kind of non-pass to signal.

This would be to _significantly_ misunderstand the way SPF is required to work.

Include mechanisms only contribute a 'match' or 'no-match' value to the
evaluation of the outermost policy (assuming no errors). What to return if no
match is found (neutral, soft-fail or fail) is defined _only_ in the outermost
record, it does not propagate back from 'include' mechanisms.

Not really. The compiler will take into account only those mechanismsinside included/redirected records that have an effect on the top-levelrecord evaluation. So non-PASS mechanims present in includes will beignored when the compiler calculates the mask. All mechanisms(pass/no-pass) behind redirects will be included in the mask.

Besides, the compiler will not generate record extensions with includes,and especially would not put non-PASS mechanisms in those generated"included" records. That would only be a waste of space.

If the compiler has to use includes to other domains, because themechanism is beyond the local administrative boundary, then it will notinsert masks at all. But this should only happen when the compiler runsfrom cron and the record includes off-boundary mechs.

3) Semantics of a failure to match a mask.  My understanding of the intended
semantics of the Mask function is a follows:

    If a Policy contains one or more Mask modifiers,
    the IP address should be tested against these masks.
    If it should match one or more of the masks,
    then it is _possible_ that a match will be found
    when the Mechanisms are evaluated - so the sequence
    of Mechanisms should be evaluated (as if the mask(s)
     were not present).
    If one or more masks are present, and the IP address
    matches _none_ of these masks, then it is known that
    none of the  Mechanisms will match, so the policy
    as a whole has failed to provide a match.

Now - what happens in this latter case: if the IP address matches none of the
masks and this is the outermost policy? From where do you get the SPF failure
code (?, - or ~)?

The mask modifier contains that prefix. Such as m=-65/5 . Please notethe "-".

Only one of the listed masks needs to provide the prefix, and thecheckers will use any prefix found in any of the masks. It's up to thecompiler to save space by only specifying the prefix once, and also upto the compiler to not specify conflicting prefixes (which wouldindicate a compiler bug).

When no prefix is provided, the outter policy will be assumed to be+all. When the all mechanism is missing, the mask will be m=?65/5

The mask really indicates "How would this evaluation end?", notnecessarily what the "all" indicates, since as you say, there may not bean all.

Of course the "-ip4:128.0.0.0/8 -ip4:0.0.0.0/8 +all" is the same as-all, but the mask compiler is supposed to see that. After all, tocompile masks it has to be proficient in applying and recognising masksitself. In this case, the top level masks should still be m=-65/5.

This complexity is a compiler implementation complexity issue. Thechecker needs do nothing smart. Which works out well, since there willprobably be more checker implementations that compiler implementations,so the chances of bugs lower.

We will have to follow the approach that Wayne followed with aregression test suite that tests the compiler's output for all kinds ofcorner cases and gotcha's. That will ensure that all compilers produceequivalent output, and the only differentiator would be how efficientthat output is when checked by a checker.

There is no mandatory last mechanism in SPF (the spec. uses only the word
SHOULD). Many people put -all or ~all, but that is not _required_. The default
value is '?'.

You might suggest that you supply it as a prefix to the Mask modifier, but (a)
this breaks the existing syntax rules and (b) what happens if there are two or
more masks and they have different prefixes?

It does not break existing syntax because the prefix is given after them= . The word following the = ("argument") can be anything but spaces,and it's syntax is specific to the modifier that uses it. So when wedefine the mask modifer, we'll have to also define the syntax of the"argument". But I we've already done this in a previous discussion.

I think you need to supply an answer to this and, if it is incompatible with
existing SPF syntax rules, accept that your proposal _must_ await (or cause) a
new SPF major version.

The mask is fully backwards compatible with both currently existentrecords, and current implementations of SPF checkers. So it will beadded to the same spec, without any need to change any version number.

The implementation of a mask is not required even in futureimplementations, if that's what you're worried about.

Similar the the RFC1035, some new data types were introduced after thefirst standard came out. It was optional both to support them and toimplement them.

An implementation can be better by implementing the standard feature,but better is not a requirement, just a recommendation. Perhaps one daythe market will make it a requirement. Until then, it's optional,backwards compatible, and can be released *now*

4) What is to be done if there is a failure to match all supplied Masks and a
'redirect' modifier is present?  Does the Mask purport to anticipate the result
of the 'redirect' as well? Or is the 'redirect' to be activated if none of the
masks match?  Don't forget to propose changes to section 6.1, if needed.

In that case, you do what the m=-65/5 mask says, ie. stop processing andreturn 'FAIL' as the SPF result.

There is currently some discussion on when to apply the mask, ie, but Ihope we'll come up with something good.

In any case, the mask should definately be applied before the redirectis applied. That's the whole point of the mask.

My previous suggestion was that the masks be applied after anyA/MX/EXISTS/IP4 mechanims in the current record, but before anyinclude/redirect.

Actually, it seems that in my responses above I shot down my ownsuggestion, because the compiler would only use includes if they pointedoutside of administrative control. In that case applying a mask on arecord out of your control is definately a no-no.

If the compiler run in the DNS server, it would flatten some includesand just list the IPs that PASS. But it cannot flatten something like:include:%{l}.domain.com, because it's not know what's to be included atcompile time.


So, my new suggestion will be to apply the masks before the redirect only.

An example:

x.com : "v=spf1 ip4:1.1.1.1 ip4:2.2.2.2 ..{many others}.include:%{l}.%{d} redirect=_s2.x.com m=-65/5 m=213/8

_s2.x.com "v=. {ip list} ..."

The mechanism that cannot be expanded at compile time will be expandedat eval time. Since it's unknown what it contains, there cannot be amask if the mask evaluates before the include. since we would still liketo have a mask, we must require that it be evaluated after all other*mechanisms* in the current record have been evaluated. redirect is a*modifier*


So it becomes short and sweet:

"The mask modifier must be evaluated before the redirect modifier."

Thank you for leading me to think of these corner cases.


Radu.