spf-discuss
[Top] [All Lists]

Re: Modifications to SPF for Mask function

2005-03-27 09:24:23
Hello, Chris,

Thank you for your comments. I will address them individually below. Point 1 remains unsolved, while that others are shown to be non-issues.


Chris Haynes wrote:
I'm sure I'm not alone in watching the current activity on a Mask function with
interest. I'm no SPF expert, but I've tried to understand SPF, and its
interactions with Sender-ID, over the last 8 months or so. There is a _huge_
amount of history here, and on MXCOMP, so I appreciate that it is probably
difficult to read all the archives and spot the critical bits of history.  The
spf drafts, however, are easily available and would repay careful attention - to
fact and to spirit.

You're right. I tried several times to make head and tails of it, but without much luck. I wish there was an executive summary of it all.

Rather than comment point-by-point over the last few days, I thought I'd store
up some comments / observations for when you (whoever you are) seem to have a
stable proposition and to be about to draft spec. changes.

Thank you. I have a few more general modifications to bring to the draft, but I wanted to allow in depth discussion on the most important ones, and not distract anyone's focus from the other, less important proposals. I think we are almost ready to proceed, and I will initiate the process of finalizing all the proposed changes early in the week.

1) Distinguishing the new record from Sender-ID records: There was an optimistic
suggestion that Microsoft should be told to change their identification to make
way for this new SPF version. This would not happen, but it is not necessary.
SPF has the policy prefix form v=spf1. Sender-ID uses both a different syntax
and a different version number - so the two are distinguishable. All you would
have to do is get agreement within the SPF community on the version number to
use for the version with the mask mechanism in it and it would not interfere
with Sender-ID.

There was no talk of version numbers, as they do not need to change.
The suggestion was that the PRA mechanism get its own dedicate hostname prefix (_{whatever}.domain.com) if a spf1 policy is present.

The reason is technical: the DNS reply packet has a limited amount of payload capacity. This may be very small, in the case of numerous authoritative records. Combined with the load sharing features of (most) DNS server software, it would mean that if the ammount of TXT data found for a host name exceeds the UDP packet size, the DNS software may do load-balancing, and omit one of th TXT records, at random.

You can see how this hurts both SPF1 and SenderID equally, and this is counter productive.

Since an MTA system that implements PRA would likely (necessarily) implement SPF too, and since the PRA evaluation depends on a successful SPF evaluation, it is easy to see that ideally, PRA should allow SPF to use the DNS packet space as efficiently as possible. To get to the PRA check, the MTA has to expand the SPF packet completely. SPF is the front line if you wish. So, would it not make sense that PRA leave as much of the DNS packet space to SPF, in order to minimize the queries that SPF must do? Whether SPF requires one more query to resolve (for fetching a record spread over more _spf{number} extensions due to less space available in the first packet, or PRA having to do another query when the SPF expanded succesfully are not the same thing. The extra query done by SPF is more expensive than the extra query done by PRA, because SPF is on the front line, and the number of evaluations required by PRA is (much) smaller than the number of evaluations done by SPF. Thus, a (much) lower chance that the PRA query is even executed.

There will be those that will say that 1 more query doesn't matter when you are already doing a bunch.

But, keeping in mind that currently most (approx 84%) existing SPF records require fewer than 7 mechanisms, and that sharing the record space with PRA at the domain.com allows each of SPF and PRA to use about 200-bytes of that packet:

When most records out there are compiled, because they happen to be hosted on a compiling DNS service, and this will happen eventually, if SPF and PRA achieve success (note the success of SPF will prompt DNS services to optimize their costs by installing compilers, not the other way around), many of those 84% of the records will compile into IP-list records that are longer than 200-bytes.

That means that when the SPF and PRA compilers use the available space of 200 bytes each, they will each have to expand their output such that they require 3 queries total for a SPF+PRA combination that used to require 2 queries before the compiler was added. This puts "the brakes" on installing compilers. In turn it puts the brakes on saving DNS traffic. In turn it puts the brakes on SPF and PRA. To some degree, I concur.

Whereas if we separated them now in the backwards compatible way I showed, the 84% would require 2 queries after compile, instead of 3 queries. When you're talking small numbers like that, a 50% unrealized savings does not look very good.

I understand that you're going from 7 queries to 3 queries, but you could go from 7 to 2, for essentially effort other than the willigness to look far enough into the future and plan for it. Since we pretend to be so much better than Microsoft, why don't we suggest this, and perhaps earn some (more) of their respect in the process. Please let's not make this a discussion of SPF vs. Microsoft. Let's keep it at doing what's best for the future. If we cannot agree on what's best for the future, we should drop the issue and live with the inefficiency. Politics has nearly killed SPF once, why try that again? Thank you kindly.

Separating the two record types to their own 'hostnames' is a way for the two standards to coexist and cooperate in lowering the overall cost of the solution.


2) If I understood the Mask proposal correctly, one of its benefits was claimed
to be that it would indicate the kind of non-pass (?, ~ or -) that would be
found if a sequence of 'includes' were followed - thus saving the need to
resolve all the includes to find out what kind of non-pass to signal.

This would be to _significantly_ misunderstand the way SPF is required to work.

Include mechanisms only contribute a 'match' or 'no-match' value to the
evaluation of the outermost policy (assuming no errors). What to return if no
match is found (neutral, soft-fail or fail) is defined _only_ in the outermost
record, it does not propagate back from 'include' mechanisms.

Not really. The compiler will take into account only those mechanisms inside included/redirected records that have an effect on the top-level record evaluation. So non-PASS mechanims present in includes will be ignored when the compiler calculates the mask. All mechanisms (pass/no-pass) behind redirects will be included in the mask.

Besides, the compiler will not generate record extensions with includes, and especially would not put non-PASS mechanisms in those generated "included" records. That would only be a waste of space.

If the compiler has to use includes to other domains, because the mechanism is beyond the local administrative boundary, then it will not insert masks at all. But this should only happen when the compiler runs from cron and the record includes off-boundary mechs.


3) Semantics of a failure to match a mask.  My understanding of the intended
semantics of the Mask function is a follows:

    If a Policy contains one or more Mask modifiers,
    the IP address should be tested against these masks.
    If it should match one or more of the masks,
    then it is _possible_ that a match will be found
    when the Mechanisms are evaluated - so the sequence
    of Mechanisms should be evaluated (as if the mask(s)
     were not present).
    If one or more masks are present, and the IP address
    matches _none_ of these masks, then it is known that
    none of the  Mechanisms will match, so the policy
    as a whole has failed to provide a match.

Now - what happens in this latter case: if the IP address matches none of the
masks and this is the outermost policy? From where do you get the SPF failure
code (?, - or ~)?

The mask modifier contains that prefix. Such as m=-65/5 . Please note the "-".

Only one of the listed masks needs to provide the prefix, and the checkers will use any prefix found in any of the masks. It's up to the compiler to save space by only specifying the prefix once, and also up to the compiler to not specify conflicting prefixes (which would indicate a compiler bug).

When no prefix is provided, the outter policy will be assumed to be +all. When the all mechanism is missing, the mask will be m=?65/5

The mask really indicates "How would this evaluation end?", not necessarily what the "all" indicates, since as you say, there may not be an all.

Of course the "-ip4:128.0.0.0/8 -ip4:0.0.0.0/8 +all" is the same as -all, but the mask compiler is supposed to see that. After all, to compile masks it has to be proficient in applying and recognising masks itself. In this case, the top level masks should still be m=-65/5.

This complexity is a compiler implementation complexity issue. The checker needs do nothing smart. Which works out well, since there will probably be more checker implementations that compiler implementations, so the chances of bugs lower.

We will have to follow the approach that Wayne followed with a regression test suite that tests the compiler's output for all kinds of corner cases and gotcha's. That will ensure that all compilers produce equivalent output, and the only differentiator would be how efficient that output is when checked by a checker.


There is no mandatory last mechanism in SPF (the spec. uses only the word
SHOULD). Many people put -all or ~all, but that is not _required_. The default
value is '?'.

You might suggest that you supply it as a prefix to the Mask modifier, but (a)
this breaks the existing syntax rules and (b) what happens if there are two or
more masks and they have different prefixes?

It does not break existing syntax because the prefix is given after the m= . The word following the = ("argument") can be anything but spaces, and it's syntax is specific to the modifier that uses it. So when we define the mask modifer, we'll have to also define the syntax of the "argument". But I we've already done this in a previous discussion.

I think you need to supply an answer to this and, if it is incompatible with
existing SPF syntax rules, accept that your proposal _must_ await (or cause) a
new SPF major version.

The mask is fully backwards compatible with both currently existent records, and current implementations of SPF checkers. So it will be added to the same spec, without any need to change any version number.

The implementation of a mask is not required even in future implementations, if that's what you're worried about.

Similar the the RFC1035, some new data types were introduced after the first standard came out. It was optional both to support them and to implement them.

An implementation can be better by implementing the standard feature, but better is not a requirement, just a recommendation. Perhaps one day the market will make it a requirement. Until then, it's optional, backwards compatible, and can be released *now*


4) What is to be done if there is a failure to match all supplied Masks and a
'redirect' modifier is present?  Does the Mask purport to anticipate the result
of the 'redirect' as well? Or is the 'redirect' to be activated if none of the
masks match?  Don't forget to propose changes to section 6.1, if needed.

In that case, you do what the m=-65/5 mask says, ie. stop processing and return 'FAIL' as the SPF result.

There is currently some discussion on when to apply the mask, ie, but I hope we'll come up with something good.

In any case, the mask should definately be applied before the redirect is applied. That's the whole point of the mask.

My previous suggestion was that the masks be applied after any A/MX/EXISTS/IP4 mechanims in the current record, but before any include/redirect.

Actually, it seems that in my responses above I shot down my own suggestion, because the compiler would only use includes if they pointed outside of administrative control. In that case applying a mask on a record out of your control is definately a no-no.

If the compiler run in the DNS server, it would flatten some includes and just list the IPs that PASS. But it cannot flatten something like: include:%{l}.domain.com, because it's not know what's to be included at compile time.

So, my new suggestion will be to apply the masks before the redirect only.

An example:

x.com : "v=spf1 ip4:1.1.1.1 ip4:2.2.2.2 ..{many others}. include:%{l}.%{d} redirect=_s2.x.com m=-65/5 m=213/8
_s2.x.com "v=. {ip list} ..."

The mechanism that cannot be expanded at compile time will be expanded at eval time. Since it's unknown what it contains, there cannot be a mask if the mask evaluates before the include. since we would still like to have a mask, we must require that it be evaluated after all other *mechanisms* in the current record have been evaluated. redirect is a *modifier*

So it becomes short and sweet:

"The mask modifier must be evaluated before the redirect modifier."

Thank you for leading me to think of these corner cases.


Radu.