spf-discuss
[Top] [All Lists]

Re: Re: Why XML

2004-06-22 15:05:48
On Tue, 22 Jun 2004 20:38:08 +0200, Hadmut Danisch 
<hadmut(_at_)danisch(_dot_)de> wrote:
I do trust a parser generated by a parser generator or a common XML
library much more than any hand-coded quick and dirty SPF parser.
There is not automated parsing tool for SPF, and a syntax like SPF
invites for a quick and dirty implementation. Eats valid SPF records.
But that's how buffer overruns are generated.

The problem is that XML is just not that simple - it has implications
that you cannot ignore.

For example, XML has at least 2 different inclusion methods

  http://www.w3.org/TR/2004/REC-xml-20040204/#sec-external-ent
  http://www.w3.org/TR/2003/WD-xinclude-20031110/#include_element

As an example of the first one at least, consider the following XML
record

 <?xml version="1.0"?>
 <!DOCTYPE record [ <!ENTITY entries SYSTEM
"http://www.schmerg.com/entries.txt";> ] >
 <record> &entries; </record>

where the named URL is a text file containing an XML fragment like
 
    <item>First</item>
    <item>Second<sub>Nested</sub></item>

Any valid XML parser sees the second line of the proper XML record as
defining a new entity (like '&lt;', '&amp;' etc.) called 'entries' that
has the value of the contents of the specified file, so when this item
is used in the 3rd line ('&entries') then we effectively have a #include
mechanism.

Sure enough, if you parse this with expat or similar you'll just see the
net result 

  <record>
    <item>First</item>
    <item>Second<sub>Nested</sub></item>
  </record>

[This is a very handy way to do log files that you can simply append to
but still have well-formed XML]

So my SPF XML parser now needs to understand arbitrary file inclusions
including how to retrieve such files - how many of http, ftp, tftp,
smb etc protocols have I just included, and how many of these have
buffer overflow and similar exploits ?

And so if you use a "quick'n'dirty" XML parser and it doesn't
understand and do this (and given the DNS TXT record limits and the
verbosity of XML, someone is
going to do this eventually) - you're broken, and you've written your
own special parser, so all your arguments about "standard parsing
code" vanish.

If there was a definition of reasonable subsets of XML to drop all the
extended stuff, I'd sort of agree, but AFAIK if you're going to do XML
you have to the whole thing, and I don't like the idea of my hardened
minimal email firewall invoking expat and a whole HTTP library on each
incoming email (have you seen the initialisation time on sablotron and
expat?).

Specialist grammars may not be perfect, but IMHO we'd do better to
reserve reasonable extensibility (by, for example, the embedded
grammar version number that starts each record) rather than pay the
whole XML-and-his-dog price.

Regards

--
Tim


<Prev in Thread] Current Thread [Next in Thread>