Re: draft-schlitt-spf-00pre4 now available

In <1099485113(_dot_)18416(_dot_)166(_dot_)camel(_at_)code3> James Couzens 
<jcouzens(_at_)6o4(_dot_)ca> writes:

Again, I am very interested in any feedback about this document.  I
want to know how it conflicts with existing SPF implementations and
how it differs from what you think SPF-classic should be.


Thank you very much for your comments James.  I appreciate them.

* The "HELO" identity is explicitly defined


I'm torn here.

1) Its a bit ambigious as to WHEN one should perform a HELO evaluation.


Hmmm..   I think that is something that can be left up to the
implementor.

2) When a HELO check fails, the session fails.  This is great if "you
MUST perform HELO evaluations", and everyone does it but we both know
that so many clients out there don't even do HELO, or pass it
forged/useless information.


Yeah, I agree, lots of people pass garbage for the HELO.

I think you should word it like this:

[description snipped]


I believe that my wording has the same semantics as those found in
spf-draft-200406 and that was a major goal of mine.  I believe that
changes in semantics from that draft need to be very carefully thought
out.

I happen do disagree with your proposal to require HELO checking, but
I think maybe we should start a different thread to debate that.

* NXDOMAIN causes a "None" result rather than a "fail".  Note that
  resolvers will return NXDOMAIN when presented with a malformed
  domain names.

  This is kind of messy.  The various SPF classic specs have been
  inconsistent on this subject, and the various implementations that
  I've checked have not been consistent.   *sigh*.


I disagree... NXDOMAIN should return unknown.  This is the first
occurrence of conflicting logic.


Well, as I said, this is messy.  The various specs conflict on this
subject.  Both libspf2 and Roger Moser's implementation currently
return None rather than Unknown/PermError.   In reality, most MTAs
check for this anyway and this is a receiver policy, not a sender
policy. 

I could probably be convinced to change the libspf2 doc and my
implementation, but it was easiest for me to choose "None" over
"PermError".

* process limit:  DNS packets that are too large MAY be ignored.


Ignoring leads to no discovery of ignorant / poorly published records.
Elsewhere you have made changes with the mindset that errors or problems
are caught and raised.  This is the second occurrence of conflicting
logic.


The problem is that all too many firewalls simply drop DNS over TCP
packets.  Yes, dropping TCP/53 packets is braindead.  This change is
really an acknowledgement of reality, rather than a strong core
belief.

For example, consider:

example.org.  TXT   "v=spf1 include:_large_spf_record.example.com -all"

_large_spf_record.example.com  TXT  "v=spf1 <a couple KB of mechanisms>"


The current SPF specs *require* you to conform to the check_host()
algorithm (or equivalent in earlier drafts).  If you are behind a
firewall that drops TCP/53 packets, you can't do this.  Mail admins
often don't control the firewalls, so there is nothing they can do to
fix the problem.  Heck, they many not even be aware that there is a
problem. 

All this wording does is say that it is ok for you to run SPF checks
in such an environment.

* process limits:  a limit of 10 PTR lookups per ptr: mechanism
  evaluation.


I would prefer an overall recursion limit period.  This offers more
flexibility whilst still providing some security.

* process limit:  no more than 10 mechanisms that do DNS lookups.

* process limit:  removed limit on the number of include: mechanisms
  because that is a subset of 10 mechanism limit.


*UGH*  This is getting overly complex.  I re-state my desire for a
simple global level of recursion.  Its more flexible, and a lot less
complex then having to track recursion for now four different things?
IMO this is unnecessary.  A global counter makes more sense.



I think it is incumbent on people who write specs and deploy software
not to allow their systems to be used to amplify DDoS attacks.

Do the tcpdumps and the math.  The overall recursion limit is not
enough security to prevent large amplification factors.


I agree that the wording in the spec is somewhat messy, but in
practice, publishers just need to count the number of mechanisms and
make sure they have less than 10.

* Unknown macro variables are syntax errors.


Not really sure if I like or don't like this... it affects extensibility
somewhat, but its still warranted I suppose.


The major reason for getting rid of unknown macro variables is that to
actually specify correct ABNF and semantics, you end up with a lot of
space being used up in the spec.

As I mentioned at the beginning of my post, I've tried hard to remove
stuff every time I add stuff so that the spec won't just keep getting
longer.  I felt that other stuff was more worthy of the space.

* security consideration:  don't let data provided by third parties
  cause problems.


..... if you include it, its your fault.  I like this better.  If you
include something thats broken, the spec should provide wording
appropriate to "a failure anywhere as a result of any included or
redirected query results in a warning or notification of some kind being
added to the Received-SPF: header" or something.


My description is too terse.  I'm not talking about includes or
redirects, I'm talking about senders that put non-ascii characters
into things like the HELO domain, the MAIL FROM, or SPF records.  I'm
talking about malicous senders that provide data that tries to trick
or corrupt the Received-SPF: headers or the explanation string.

Most SPF implementations that I'm aware of already do cleanup of
invalid charachters and such. (the "sanitize" option)  This is just
putting something in the spec to make people think about it.

Overall I like it.


Glad to hear it.

I VEHEMENTLY disagree with:

* all syntax errors must be detected, not just when they are
  "encountered". 

You should not be specifying things like this.  It is unnecessary to
parse the entire SPF record.  Doing so requires a pre-parse, and its a
waste.  Currently you can halt whenever you encounter an explicit result
or error.  If I find one error I do not need to go and find more.  There
is only a single method or placeholder for returning an error, and no
way to specify that multiple errors were found, nor should there be.


You do not have to detect all syntax errors, you just have to detect
that at least one exist and return PermError.

You do not have to pre-parse the SPF record, you can just continue
parsing after you have found an answer, but not do any more DNS
lookups or change the result.

All of the the SPF specs say that you must return PermError when you
"encounter" a syntax error, but it is never defined what exactly
"encountered" means.  Meng's Mail::SPF::Query code never, ever, under
any circumstances, manages to "encounter" a syntax error.


This is probably personal preference, but when it comes to security
issues and email, I like well defined and consistent results.  I don't
think SPF records like "v=spf1 a (#*LSLK$" should sometimes return a
pass and sometimes return a syntax error.  I don't like the fuzzyness
of whether "v=spf1 -all accredit=_spf.%(d)" means that the syntax
error in the macro variable is "encountered" because you you have to
look for the "exp=" modifier.


The wording I use is short and clear.

This is bad bad bad, and entirely unnecessary.  SPF is already enough of
a pig with the DNS queries, suffice to say, being able to abort the
parse upon an explicit result is a huge win, please do not take this
away.


No, stopping the parsing is not a huge win.  It saves only a very
small amount of CPU time.

Also:

Four times your changes logically conflict with the overall tone in your
changes.  I dislike this lack of consistency.  Let me explain.  Overall
I get the tone that the changes are to aide in problems being found and
reported, so that overall SPF records do not sit idle in DNS for months
on end failing mail with no warning.  Am I on track?  Four times you
specify changes that completely conflict with this:


Yes, you are on track.

#1
--

* NXDOMAIN causes a "None" result rather than a "fail".  Note that

  resolvers will return NXDOMAIN when presented with a malformed
  domain names.


My spec does not return a "fail", is this a typo, or is this specified
in another draft?


NXDOMAIN returning fail is part of draft-lentczner-spf-00.

My draft has what I believe to be correct:

If the domain does not exist (NXDOMAIN), SPF clients MUST return 
   "unknown".


You should use this.  Its correct.


I outline my reasoning above, so I'll skip repeating it here.

#2
--

* process limit:  DNS packets that are too large MAY be ignored.


Bad!  This again goes against the overall tone of your draft changes.
If a packet is ignored, then how do we know it happened?

I recommend this sentence be removed or changed to read:

* process limit: DNS packets that are too large MUST NOT be ignored and
MUST cause the client to return error


As I mention above, my wording isn't something I like as much as
something that I acknowledge as reality.

#3
--

* all syntax errors must be detected, not just when they are
  "encountered". 

I've already been over this one, but it is indeed the third logical
discrepancy that I've encountered.  There is no value in doing this.
PLEASE do not include it.


Again, I've explained above, so I won't repeat it ihere.

#4
--

* the ptr: mechanism and the %{p} macro ignore DNS errors instead of
  triggering TempError. 

This is the fourth one which again conflicts completely with the overall
tone presented here, and even in view of ignoring that, its a bad idea.
Its a bad idea because you should not IGNORE anything.  What if your
spell checker opted to ignore words without user input?  We are parsing
records for validity.  We need to be black and white here.  There should
be no MAY's, and only MUST.  I realize that this probably isn't likely,
but at the very least, you can't go about ignoring DNS related errors.
You __MUST__ return error.  How else will people know?  Sure DNS is
flakey, we accepted that when we started using DNS to be host to our
records.


Ok, here is the reason that ptr: and %{p} have to ignore DNS errors.

Say a domain publishes an SPF record that says "v=spf1 ptr -all".  Say
that they have made sure that all their hosts have valid rDNS PTR
records.  Everything looks fine.

Now, some spammer comes along and sends email claiming to be from this
domain.  This spammer is on a part of the internet where all rDNS
lookups return SERVFAIL forever.

What happens?  Well, under the current draft, the DNS errors are
supposed to trigger a TempError, which should cause the email to be
retried later.  However, since the rDNS problem will never go away,
the email just gets stuck in the system.

Now, this whole thing becomes a more serious problem when you use the
trusted-forwarder whitelist.  As per a request by Meng, I use a %{p}
macro in order to do domain name based whitelisting in addition to IP
based whitelisting.

So, what happens is that all SPF checks that fail cause a check in the
T-FWL.  This cause the use of a %{p} macro, which causes lots of email
to back up.

You are right, my changes to say that DNS errors must be ignored for
ptr: and %{p} run counter to my desire to have more strict and
consistent error checking.  However, I don't see any other option.

I honestly think you should drop the entire 'libspf2' aspect of this,
and I urge that you work further with the rest of this group to publish
and submit a proper SPF draft.  Its clear you are capable of it.  This
document suffers because of personal preference decisions as opposed to
being logically consistent.



Sorry, but I'm not going to try and fight Meng and MarkL in the IETF
over which draft should be used.


-wayne