On Extensibility in MARID Records

I've been asked to defend the proposition that MARID records need more
extensibility than can be easily afforded by SPF syntax.  This thread is
an attempt to do so, using two possible extensions.  To paraphrase Mark
Twain, I apologize for writing such a long email -- I didn't have time
to write a short one.

To set the stage, let's consider Rich Segal's comments at the interim
meeting.  He says IBM will publish a record something like:
    v=spf1 +a:xxx +a:yyy +a:zzz -exists:%{i}.rbl.com ?all
This says that if a message comes from one of the servers he knows
about, it's good.  If a message comes from a blacklisted address, it's
bad.  Otherwise, he's not sure.

Now, he would like to be able to change the final "?all" to "-all", but
he's not sure he's found all of the servers that send mail on IBM's
behalf.  Therefore, he needs feedback about mail that matches the "?all"
so that he can refine his list.  At first blush, he could say something
like:
    v=spf1 +a:xxx +a:yyy +a:zzz -exists:%{i}.rbl.com ?all
feedback=feedback(_at_)ibm(_dot_)com

But that begs the question of what feedback he wants.  In order to get
feedback on mail that matches the "?all", the "feedback=" needs to bind
to the "?all".  So, we could invent syntax like:
    v=spf1 +a:xxx +a:yyy +a:zzz -exists:%{i}.rbl.com
?all/feedback=feedback(_at_)ibm(_dot_)com
(note the slash).  This would be a step forward.  It's also possible
that their legal people would like feedback about mail that matches the
"-exists", so that they can sue the offenders.  So maybe their record
ends up looking like:
    v=spf1 +a:xxx +a:yyy +a:zzz
-exists:%{i}.rbl.com/feedback=abuse(_at_)ibm(_dot_)com
?all/feedback=feedback(_at_)ibm(_dot_)com

Now, for purposes of refining their "?all", they may only want headers
of messages that match, but for prosecuting abusers, they may need the
entire message.  So they might want to publish something like:
    v=spf1 +a:xxx +a:yyy +a:zzz
-exists:%{i}.rbl.com/feedback=abuse(_at_)ibm(_dot_)com,ret=full
?all/feedback=feedback(_at_)ibm(_dot_)com,ret=hdrs

Now, IBM may very well want to monitor the activity of the third party
they contract with that uses address zzz, so that mechanism might look
like
    +a:zzz/feedback=monitor(_at_)ibm(_dot_)com,ret=hdrs

All of the above has had to do with a hypothetical "feedback" extension,
with various parameters.  It's also possible that there will be
extensions that classify the mail coming from various sources.  For
example, if the address range +a:zzz is used by IBM for sending bulk
mail, the mechanism may want to say
    +a:zzz/type=bulk

Putting the two extensions together, we get
    +a:zzz/feedback=monitor(_at_)ibm(_dot_)com,ret=hdrs,type=bulk

Now we're really pushing the limits. In the above example, "ret=hdrs"
modifies "feedback", which modifies the mechanism.  But "type=bulk" just
modifies the mechanism.  So the above example has syntactic ambiguity
built in. We need a heavier structure to resolve the ambiguity: brackets
or parentheses or tags or some such.  We could get there with
parentheses by writing something like:
    +a:zzz(feedback:monitor(_at_)ibm(_dot_)com(ret=hdrs))(type=bulk)

But now we've already left the realm of a simple regex parser -- the
possibility of nesting implies at least a push-down parser.  If we ever
expect to be able to extend SPF records, we need to define the
extensible syntax now.  (Not the details of "feedback" or "type", but
the syntax for telling where each extension starts and ends).  That
syntax needs to account for special characters, like email addresses
that contain an "=" or ":".  Processors that we deploy today need to be
able to parse (and discard) this stuff, even though things like the
"feedback" extension may not be defined for another year or two.  By the
time we've got enough extensibility, the complexity of a conforming
parser approaches that of XML.


XML is rich enough to handle all of the above.  The spec is well
debugged.  XML Schemas include a mechanism for defining what's legal in
a document, even if part of what's legal is tags that haven't been
defined yet.  There are dozens of extant parsers -- for Linux, for Unix,
for Windows, for S/390.  Written in C, C++, Java, Perl, Python, C# and
probably even Intercal.

So, XML fits the requirements that I've outlined above, and it's the
only thing that I know of that does.  Of course, it would be possible to
design an SPF-like language that meets these, but "possible" isn't
enough.  It hasn't been done, and I doubt that we would get it right on
the first try if we did attempt it.