Re: A 30% solution


In <p0610112bbcc7e584f658(_at_)[216(_dot_)43(_dot_)25(_dot_)67]> Pete Resnick 
<presnick(_at_)qualcomm(_dot_)com> writes:

On 5/11/04 at 9:48 PM -0500, wayne wrote:

The MARID records will contain entries that (when fully resolved)
will give the receiver two sets of IP addresses,


Two points:

1) SPF creates a set for "DNS errors" to deal with situations when
something can not be fully resolved due to a temporary name server
problem.  I think this is an important set and should be kept.

2) You actually describe three sets.  The third being the set of IP
addresses that are not known to be legitimate or illegitimate.


Actually, there is no third set in my model. What you get back is
either legitimate or illegitimate addresses.


Every time you define a set you, in effect, also define the
complementary set.  Just because the domain defines two sets in your
proposal doesn't mean that your proposal only deals with two sets.  I
think it is important to make it clear than an IP address in your
proposal can be in more than just one of two sets.

Because of that, I'm not sure that the "DNS error" set makes sense:
Failure to get back an address is failure to get back an address,
whether due to exhausting the records, or due to a temporary DNS
failure.


I think error conditions need to be handled differently than the
domain owner simply not defining a situation.

This is not just a syntactical issue.  You can't depend on the order
of the records in a RR set


Really? As I understood, many systems depend on getting records back
in a particular order for round-robin applications. Or do you mean
that there is simply not a *guaranteed* order because UDP packets
might arrive out of order?


Because many DNS servers will return records in a round robin order
and because that order depends on other queries made by other systems,
an individual system can't depend on the order of the records in a RR
set.

What happens if an IP address is in more than on set?  Would a MARID
client need to process all MARID records in order to determine the
outcome, or can you short-circuit the checks?


I think it would be reasonable to say that an IP address appearing as
both legitimate and illegitimate is a "configuration error" for all
intents and purposes, and short-circuiting is a reasonable
optimization.


Yes, but then this configuration problem of an IP address being in
both sets will show up only some of the time, making debugging much
harder.


Syntax and semantics can be very hard to separate because what you
mean (semantics) is often limited by what you can say (syntax).


So, by using a RR set (syntax), we can't express what it means when an
IP address is in more than one set.  This could be solved by adding a
priority field to the RRs (new syntax) so that the evaluation order
would be fixed.

IIRC, Dave Crocker is right.  Only two-level expressions of ANDs of
ORed variables or ORs of ANDed variables (and the equivalent
UNION/INTERSECTION set notation) are needed to express anything.
Those normal forms (CNF, DNF?  I forget) have some nice properties,
but they usually don't have the least number of operators nor
references to variables.  Hmm..  I'm pretty sure those normal forms
need to be able to complement a variable.

The two-level normal forms are often useful intermediate steps in
creating an optimized circuit/expression.  Once you have an expression
in one of these forms, you can then try to minimize the number of
variables or minimize the number of operations or both.

For MARID, it is very important to minimize both the number of
variables (DNS lookups) and the number of operators (length of the
expression in the MARID DNS records).  So, I think that, while
theoretically elegant, the two-level normal forms are not at all
appropriate for our work.


Ok, back to Pete's proposal.

Say I want to express the idea that only the IP addresses 1.2.3.4 and
5.6.7.8 are the only legitimate senders of email claiming to be from
example.com.  The "legitimate set" (L) is easy to define, that needs
two MARID records.  The "illegitimate set" (I), however, needs to
define all IP addresses other than those two.  The (implied) syntax
that Pete proposes makes expressing this tedious since there is no way
to give a complementary set.

Pete didn't specify whether the IP range data would be in the form of
a.b.c.d-w.x.y.z, or in CIDR notation of h.i.j.k/nn.  The latter is
shorter most of the time, but since 1.2.3.4 isn't on a CIDR boundary,
you would have to create even more MARID records to specify set I
(illegitimate IP addresses).


Now, in practice, example.com will probably not really care about
specifying specific IP addresses, rather they will want to specify
host names, which Pete's proposal allows.   Actually, they may well
want to just say "out MTAs" == "in MTAs", but that isn't allowed 
under Pete's proposal.

So, example.com has:

example.com.  MX  smtp.example.com.
smtp.example.com.  A  1.2.3.4
example.com.  MX  secondary.example.net.

and under the control of example.net:
secondary.example.net. A 5.6.7.8

Now, even if you add the complement-set operator, you can't easily
express the set of illegitimate IP address since !smtp.example.com
includes the IP address of secondary.example.net and vice versa.  More
over, example.com may have no idea when example.net changes the IP
address for secondary.example.net.


Pete's proposal was, in his words, a strawman proposal.  What I'm
trying to point out here is not so much that Pete's proposal has
problems, but that it is really hard to separate syntax from semantics
and that theoretically clean and elegant systems often conflict with
the messy real-world situations that SMTP exists in.


-wayne