Discussion of MS's Pro XML arguments

Meng,

| 1.    We believe it's critical to have an architecture for
| uncoordinated extensibility of the information published about a
| domain's email policies. Once deployed, we expect (and indeed hope) that
| others will build on top of what is initially defined with new ideas and
| functionality. They need to be able to do this without the need to act
| through some all-powerful central coordinating authority yet still be
| assured that their extensions both won't conflict with those of others
| and also won't disturb the operation of existing non-extension-aware
| interpreters of the data. XML already has a flushed-out and mature
| architecture for doing this (its namespace support and the wildcard
| infrastructure in XML Schema are critical pieces of it), one that was
| developed through a significant learning curve that would be both
| arduous and error-prone to repeat.

"Uncoordinated extensibility" is not a good goal for something you want
universally deployed. It looks good on paper, but in reality,
everybody and his uncle implement missing features themselves, so that
when you start working on version 2.0 of your standard, there are
lots of different, conflicting proposals to solve the same problem
which are a already deployed at some sites and which are mission
critical there. This means, you'll have a hell of a time to get
everybody agree on version 2.0 of the standard.

| 2.    There already exists an incredible variety of deployed XML
| parsing tools available in a wide array of languages on virtually every
| platform anyone might care to want one for. These help raise the
| interoperability bar, in that by using these tools applications can
| avoid introducing inadvertent lexical and scanning problems due to bugs
| or specification ambiguities: the XML tree model semantically projected
| by these tools (it's so-called "document object model") makes it more
| difficult for these problems to creep in. Among other issues, for
| example, complications caused by issues of character set encodings are
| already handled. This even helps avoid things like the buffer-overrun
| errors that have lead to so many security alerts in recent years :-).

XML is good for serializing object graphs. But we're not dealing with
graphs in SPF, so XML is not adequate. As for available libraries:
Experience shows that programs over 1000 loc are never error-free.
Just follow Bugtraq for a few days and you'll be amazed how many
(exploitable!) errors still get found in very mature code. Imagine,
all SPF users have to read XML. There are only a handful of popular
libraries for C(++) or Java, so each library will be used in a
significant percentage of SPF tools. Imagine, an exploitable error in
one of this libraries is detected. Then all a hacker needs to do is to
write an exploit (in XML), publish it as a TXT record in the DNS for
a domain that is under his control (or hack a DNS server), and send
some spam that pretends to come from this domain. In a few minutes he
will 0wn a substantial percentage of mail servers in the Internet with
good connectivity. And there is no way for the owners of vulnurable
mail servers to escape this attack: They would either have to block
SMTP and DNS or shut down their servers, none of which is an option.
SPF, as it stands, is very easy to parse, so there will be lots of
different implementations of the parser which will limit the overall
effect to the Internet if one of them gets compromised.

| 3.    These deployed parsing tools build upon the quite mature and
| polished XML syntax and lexical specification
| <http://www.w3.org/TR/REC-xml> , again helping to assure
| interoperability. The API of these tools is often based on the equally
| well established document object model <http://www.w3.org/DOM/> ,
| helping to assure portability of client code.

The fact that something is standardized doesn't mean that looking for
something else is always a bad idea. In Germany, we use A4 for
letters. Does this mean we should use this format for each and
everything, like book sizes, screen sizes, plate sizes? Nobody would
think of this. One size does not fit all. Even in cookie-cutter
Germany. And once you cross the ocean, there is a totally different,
yet as well established system: Legal and Letter sizes.


| 4.    The XML Schema <http://www.w3.org/TR/xmlschema-0/>  definition
| language exists and is mature. For applications trying to use XML, like
| Caller-ID, this provides value by given a means to denote
| application-specific syntactic intent in such a way that generic schema
| validation tools can verify that the structure of a given XML document
| is well-formed and syntactically valid from an application point of
| view. The ability to provide this sort of formal syntactic specification
| is also crucial to being able to support uncoordinated extensibility in
| a robust way: as an interpreter of data, you need to formally know where
| someone might put in some new datum you need to robustly be prepared for
| and skip over vs. other places where you can assume more tightly you
| understand what the data looks like.

(E)BNF is even more mature, and has a sound mathematical foundation.
Every CS graduate in the civilized world had to attend a formal
languages / compiler construction class. Compared to that XML Schema
has a long way to go. The fact that in the XML world you need
complicated tools, to simply check the validity of an argument is an
indication that things have grown way too complex here. Anders
Heijlsberg's (sp?) design of the metadata part in C# has shown that
there are ways to design extensibility in a limited and manageable way
into a language.

| 5.    Validating parsing engines are also broadly deployed. Beyond
| just the syntactic parsing and verification performed by the lower level
| engines, applications using validating engines can be assured that data
| they are about to interpret conforms to the structural syntactic that
| the application expects. As a result, error checking and validation code
| in applications is reduced, and greater interoperability results.

This is a very dangerous way of programming! Parameter checking needs
to be done all over the program. Not checking parameters and trusting
the caller has led us into the mess we have right now.
Also, the fact that you need a complex, validating parser before you
can even touch the data tells us that complexity is way too high for
us to manage it in a reliable way.

| 6.    A number of rich auxiliary architectures have already been
| defined for XML, notable among them XML Encryption and XML Signature.
| Being able to leverage these infrastructures provides powerful
| possibilities for future enhancements to Caller-ID. For example, it is
| entirely trivial to create a signed Caller-ID Email Policy Document: one
| merely uses the XML Signature's "enveloped signature" mechanism in one
| of the document's wildcardable extensibility points. This works with
| existing tools and existing credential management mechanisms, and all
| that using only a handful of lines of new code to be able to put it all
| together. Being able to selectively encrypt parts of a policy document
| while keeping other parts open is also quite likely of significant
| interest to some publishers. All this just comes architecturally for
| free; duplicating the designs for a custom data model would be a huge
| amount of work. Similar synergies also exist with other XML
| architectures, such as XML Query and its relation to databases, though
| the utility there is less viscerally obvious.

No, it does not come for free. All these libraries need to be compiled
into SPF aware systems and make them even more fragile. The goal of
programmers of SPF systems is to find each and every mistake in their
software. The goal of a hacker is to find just one teeny weeny error
and exploit it. Each additional line of code, and there are thousands
of it in each library, makes the system more fragile and error prone.
This XML vision is a vision of a fragile house of cards.

And it totally ignores our limit of 512 bytes for DNS UDP packets.
Does anybody seriously think s/he will be able to put an XML-encoded,
encrypted policy description on top of an SPF description in 512 bytes?

| 7.    There is a huge vibrant industry out there building a large
| variety of XML tools to address various needs. A product like XML Spy
| <http://www.xmlspy.com/> , for example is a powerful XML development
| environment (XML Spy is what I used to produce the XML syntactic
| diagrams in the Caller-ID specification), and it has at least a dozen
| significant competitors. Other authoring environments like Visual Studio
| and other text editors already have XML text coloring and optimized
| keyboard navigation built-in. Internet Explorer has nice hierarchical
| XML rendering and browsing available just by loading a .xml file. The
| list goes on. Also, various transformation tools and architectures like
| the XSLT <http://www.w3.org/Style/XSL/>  stylesheet language provide
| rich declarative means by which raw XML data can be automatically
| transformed into other forms such as human-presentable HTML.

There also is a huge and vibrant industry out there in the red-light
district of Stuttgart. The fact that a lot of people get lured into a
certain area doesn't mean that we should go there, too.

The fact that you need these tools underlines that the complexity of
these descriptions is so high that humans can't handle them anymore,
or even check their validity. How can we trust something that we can't
check (=debug) anymore?

| 8.    There exists a large extant body of technical professionals
| already educated and trained in XML.

There exists an even larger body of technical professionals who can
read and write ASCII strings.

| 9.    There exists a dedicated industry (books, seminars, etc) of
| companies and people working to educate same and grow this pool. The
| investment put into educating oneself in XML is leveraged knowledge
| beyond just administering a spam-deterrence infrastructure that can help
| one's career grow and expand outward.

Anybody who works in the IT field can buy an XML Bible, read through
it over the weekend, and understand the principles. The principles are
not difficult, it's the complexity of the application that is the
killer.

Learning something new is always rewarding. But where is the
connection between a personal advantage (increased knowledge) and the
SPF standard? I don't see the connection.


Carsten

-------
Sender Permitted From: http://spf.pobox.com/
Archives at http://archives.listbox.com/spf-discuss/current/
Latest draft at http://spf.pobox.com/draft-mengwong-spf-02.9.4.txt
To unsubscribe, change your address, or temporarily deactivate your 
subscription, 
please go to 
http://v2.listbox.com/member/?listname(_at_)©#«Mo\¯HÝÜîU;±¤Ö¤Íµø?¡