RE: Benefits of Schema-Aware Processors

I imagine that this question has come up in the past, but I 
thought I'd 
ask anyway: What are the benefits of schema-aware XSL processors?


The benefits fall into two categories: robustness and optimization.
Optimization is still a theoretical, speculative benefit, so I'll
concentrate on robustness.

You can argue the case in high-level abstract terms, or with low-level
coding examples. Let's try a bit of each.

Firstly, stylesheets are written with knowledge of the input and output
schema, but at the moment this knowledge is in the programmer's head and
isn't shared with the compiler. This means that when the programmer makes
mistakes, due to incorrect reading of the schema, or perhaps because the
schema has changed, the compiler cannot detect them. It's good software
engineering discipline to describe the inputs and outputs of a component
(the preconditions and postconditions) and this applies to XSLT as much as
anything else. The more complex the schema becomes (and some industrial
schemas are very complex indeed) the harder it is for the programmer to keep
everything in their head. In addition, it's very hard to achieve 100% test
coverage. Many schemas contain parts that are only rarely used, but if you
want to produce a production-quality stylesheet you need confidence that it
can handle everything that will be thrown at it.

At a practical level there's no doubt that debugging and testing XSLT
stylesheets is currently rather difficult, and most of us don't do it very
rigorously. We tend to test a stylesheet on a rather small sample of input
documents, and we check the output visually to see if it looks OK, perhaps
running a few sample outputs through a schema checker if we're being
conscientious. When we get things wrong it can be very hard to spot where
the trouble is, especially if it's in code written by someone else a while
back when the schema was rather different from the way it is now.

I like to demonstrate this by taking a correct stylesheet and introducing
random errors, and showing how without a schema they produce bad output that
can be very difficult to spot (in one example I can cite, it meant that
out-of-range numbers were not being highlighted as they should have been,
and no-one noticed), while if you make the same error with a schema-aware
processor, you get an explicit error message telling you exactly what's
wrong.

If you can define the schema for your input and output documents and make
this known to the XSLT processor, this can make a big difference to the
development cycle. In practical terms, the biggest benefit I've seen is from
integrated validation of the result document: if your stylesheet tries to
write invalid output, you get an error message pinpointing exactly where the
error in your stylesheet is, rather than 300 identical errors from the
schema processor telling you that the output is wrong, which you probably
knew anyway. At present with Saxon this validation is mainly done at
run-time, but more and more of it will be done at compile time, which means
that you even get to know about errors in code that hasn't been executed
because you haven't written the test data for it yet.

I've yet to see such a big impact from using a schema for the input
document, but I think the potential is there too. The biggest potential
advantage is better reuse of stylesheet code, and better resilience to
changes in the input schema, by driving your template rules from schema
types rather than lexical patterns. This mainly applies to the kind of
complex schemas with hundreds of element types. In such schemas there is
usually some kind of type hierarchy, and if it is well-designed then you
should be able to get the kind of benefits you see from object-oriented
programming, by writing code that's generic or specific as the need arises.

I hope that gives you something to think about!

Michael Kay
http://www.saxonica.com/


I'd like to learn more about the issue in both general terms 
and as it 
applies to the specific application I'm working on. I'm 
consulting with a 
firm that specializes in data warehousing for the insurance 
industry. We 
are using XML for a variety of document-production purposes 
(started with 
a dynamically generated data dictionary, now working on dynamically 
generating online help, etc.), but we are also exploring the 
xml->database 
and database->xml potential of XSL. What benefits might we 
derive from 
using a schema-aware-processor in such an environment?

Thanks.

Jay Bryant
Bryant Communication Services

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--