Re: On Extensibility in MARID Records


On Wed, 2004-06-16 at 09:37, Jim Lyon wrote:

I've been asked to defend the proposition that MARID records need more
extensibility than can be easily afforded by SPF syntax.  This thread
is an attempt to do so, using two possible extensions.  To paraphrase
Mark Twain, I apologize for writing such a long email -- I didn't have
time to write a short one.

To set the stage, let's consider Rich Segal's comments at the interim
meeting.  He says IBM will publish a record something like:

    v=spf1 +a:xxx +a:yyy +a:zzz -exists:%{i}.rbl.com ?all
This says that if a message comes from one of the servers he knows
about, it's good.  If a message comes from a blacklisted address, it's
bad.  Otherwise, he's not sure.

Now, he would like to be able to change the final "?all" to "-all",
but he's not sure he's found all of the servers that send mail on
IBM's behalf.  Therefore, he needs feedback about mail that matches
the "?all" so that he can refine his list.  At first blush, he could
say something like:

    v=spf1 +a:xxx +a:yyy +a:zzz -exists:%{i}.rbl.com ?all
feedback=feedback(_at_)ibm(_dot_)com

But that begs the question of what feedback he wants.  In order to get
feedback on mail that matches the "?all", the "feedback=" needs to
bind to the "?all".  So, we could invent syntax like:

    v=spf1 +a:xxx +a:yyy +a:zzz -exists:%{i}.rbl.com
?all/feedback=feedback(_at_)ibm(_dot_)com
(note the slash).  This would be a step forward.  It's also possible
that their legal people would like feedback about mail that matches
the "-exists", so that they can sue the offenders.  So maybe their
record ends up looking like:

    v=spf1 +a:xxx +a:yyy +a:zzz
-exists:%{i}.rbl.com/feedback=abuse(_at_)ibm(_dot_)com
?all/feedback=feedback(_at_)ibm(_dot_)com

Now, for purposes of refining their "?all", they may only want headers
of messages that match, but for prosecuting abusers, they may need the
entire message.  So they might want to publish something like:

    v=spf1 +a:xxx +a:yyy +a:zzz
-exists:%{i}.rbl.com/feedback=abuse(_at_)ibm(_dot_)com,ret=full
?all/feedback=feedback(_at_)ibm(_dot_)com,ret=hdrs

Now, IBM may very well want to monitor the activity of the third party
they contract with that uses address zzz, so that mechanism might look
like

    +a:zzz/feedback=monitor(_at_)ibm(_dot_)com,ret=hdrs

All of the above has had to do with a hypothetical "feedback"
extension, with various parameters.  It's also possible that there
will be extensions that classify the mail coming from various
sources.  For example, if the address range +a:zzz is used by IBM for
sending bulk mail, the mechanism may want to say

    +a:zzz/type=bulk

Putting the two extensions together, we get
    +a:zzz/feedback=monitor(_at_)ibm(_dot_)com,ret=hdrs,type=bulk

Now we're really pushing the limits. In the above example, "ret=hdrs"
modifies "feedback", which modifies the mechanism.  But "type=bulk"
just modifies the mechanism.  So the above example has syntactic
ambiguity built in. We need a heavier structure to resolve the
ambiguity: brackets or parentheses or tags or some such.  We could get
there with parentheses by writing something like:

    +a:zzz(feedback:monitor(_at_)ibm(_dot_)com(ret=hdrs))(type=bulk)

But now we've already left the realm of a simple regex parser -- the
possibility of nesting implies at least a push-down parser.  If we
ever expect to be able to extend SPF records, we need to define the
extensible syntax now.  (Not the details of "feedback" or "type", but
the syntax for telling where each extension starts and ends).  That
syntax needs to account for special characters, like email addresses
that contain an "=" or ":".  Processors that we deploy today need to
be able to parse (and discard) this stuff, even though things like the
"feedback" extension may not be defined for another year or two.  By
the time we've got enough extensibility, the complexity of a
conforming parser approaches that of XML.


XML is rich enough to handle all of the above.  The spec is well
debugged.  XML Schemas include a mechanism for defining what's legal
in a document, even if part of what's legal is tags that haven't been
defined yet.  There are dozens of extant parsers -- for Linux, for
Unix, for Windows, for S/390.  Written in C, C++, Java, Perl, Python,
C# and probably even Intercal.

So, XML fits the requirements that I've outlined above, and it's the
only thing that I know of that does.  Of course, it would be possible
to design an SPF-like language that meets these, but "possible" isn't
enough.  It hasn't been done, and I doubt that we would get it right
on the first try if we did attempt it.


Just as XML is extensible, it could be done in a manner that requires
cooperation of the standards process before such changes are possible. 
Lock down what is contained in a related RR by making headers virtual
and standalone and bar additional definitions.  Change the possible
content by changing the version declaration to reference a new and
expanded virtual header.  Adding features that "might" be useful is
emblematic of "everything including the kitchen sink" that seems to
prevail when text and XML is considered.  DNS is not an http server.

It seems to be a wild claim an organization or ISP will be able to
define in this single record all possible avenues employed by their
users.  The most an open list provides is a check mark added next to the
mail subject line. 

Can these extra functions be done using other methods?  

Claiming records may be chained or some may use TCP does not help.  If
such innovation can not be confined to a simple expression with
constrained results, then add a single _mail-vouch._tcp.my-domain SRV
record and point to a real http server where the SRV record also
declares the port to be used.  Now all things web-like become possible. 
Use dynamic techniques to allow users to add or subtract from their list
automatically.  With this done, it will not cause great concern for the
health and function of DNS or MAIL.  This can happen at the MUA where it
can scale and perhaps even cache much of the information as much of this
will likely be repetitive at the user level.  A mail marking scheme does
not improve the basic mail infrastructure to be worth considering at the
MTA level.

-Doug