RE: [xsl] Static Validation of XSL Transformations

Colin Adams wrote:

From: Anders Møller <amoeller(_at_)brics(_dot_)dk>
>Given an XSLT 2.0 stylesheet, S, and two schemas, D_in and D_out, the
>tool is able to check statically that all output of S at runtime is
>valid according to D_out assuming that the input is valid according to
>D_in. Additionally, the tool produces a flow graph of S. Schemas are
>written in either DTD, XML Schema, or Restricted RELAX NG.
>A research paper describing the analysis is also available from the web
>site.


That was very interesting.

Does it cope with doc/document/collection functions? I couldn't work outfrom your paper how it would do that.

No, multiple inputs are currently not supported, but I believe that manytypical cases could be handled with little effort.


Michael Kay wrote:

An interesting (and very thorough) paper.

Like Colin, I was wondering as I read it how you deal with multiple =
input
documents (and schemas). And indeed, multiple output documents. One of =
the
problems is that when you look at a template rule in isolation, you =
really
don't know whether it is writing part of the final result tree, or some
intermediate temporary tree whose structure is statically unknown.

We're not just looking at template rules in isolation - a central partof the analysis is to figure out the control flow between the templates.


Compared with Saxon's static analysis and type checking, you seem to be
making a lot of what one might call "95% guesses". For example, you seem =
to
be assuming that if you see <xsl:template match=3D"x">, and there is an
element x in the schema, then the actual element is going to be a valid
instance of that type. (What if there's more than one x in the schema, a
global one and a local one?) Saxon doesn't make this assumption; if you =
want
this kind of checking, you have to write match=3D"schema-element(x)". Of
course the assumption is likely to be right 95% of the time, and for a
"lint" kind of tool that's fine, but I don't feel it's appropriate for a
production compiler. (We had a long debate about this at a W3C meeting =
under
the heading of "the assumption of validity").

Yes, our approach is a pragmatic program analysis. I view this as asupplement to static type checking, not as an alternative. And I'm alsoa big fan of static type checking :-)The validator (based on dk.brics.schematools - seehttp://www.brics.dk/schematools/) handles overloading (your example withmore than one x) without problems. I'm not sure which assumptions youmean: the analysis is *sound* (i.e. over-approximating) - at least,that's the intention.


Clearly the more complex the stylesheet becomes, in terms of handling
multiple inputs and temporary trees, the more likely it is that there =
will
be data around that for some good reason does NOT conform to the input
schema.

The other technique that you're using which Saxon doesn't currently use =
is
to analyze across apply-templates calls. I think there's probably quite =
a
lot of mileage in doing this. In fact Saxon doesn't even analyze across
call-template or function calls unless you actually declare the type of =
the
result of the target template or function.

One thing that's interesting about the paper, I think, is to see how =
much
can be done simply with knowledge of the input and output schemas, =
without
any other changes to the stylesheet to incorporate type declarations and =
to
explicitly invoke validation. There's a lot to be said for this, because
XSLT programmers take a lot of re-educating to declare types of =
variables
and parameters and the more you can do in the absence of such =
declarations,
the better. I've been moving in similar directions with Saxon in some of =
the
optimizations that are done, for example trying to infer when

<xsl:variable name=3D"x"><xsl:value-of select=3D"EXP"/></xsl:variable>

can be safely rewritten as

<xsl:variable name=3D"x" select=3D"EXP"/>

by analyzing the expressions in which $x is used. Of course, when you're
doing optimization a 95% guess certainly isn't good enough, you need to =
be
absolutely sound. (Though you can use guesses to decide between =
different
evaluation strategies, of course, provided that the strategy still works =
if
the guess is wrong.)

It's interesting to note that you ask the user for two pieces of
information: the schema for the input document and the schema for the
result. In XSLT 2.0 we don't actually provide syntax to allow the user =
to
declare the schema for the input document. We tried to tackle that a =
number
of times but never quite found a solution that worked, but I'm really =
not
quite sure why not! It's partly of course that a complex stylesheet can =
have
multiple entry points and can be designed to do more than one job =
depending
on the entry point that you choose. But again, that's the 5% case rather
than the 95%.


Thank you for your interesting comments! :-)

--
Anders Moeller
amoeller(_at_)brics(_dot_)dk
http://www.brics.dk/~amoeller

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--