xsl-list
[Top] [All Lists]

RE: SAX filters (was Re: Generating NCRs)

2003-01-17 02:04:33
Michael Kay wrote:
Use a text editor, or perhaps a SAX filter, to replace "&#" by 
"&". Why use a power drill when you can do the job with a hammer?

A long while ago, Dave Pawson told me that you and I mention 
SAX filters a bit 
too often to ignore in the FAQ. Can we maybe put together a 
list of use cases 
for when a SAX filter is more appropriate than XSLT, and 
maybe a brief demo of one?


No time to do a proper job on this, but here's a starter for ten (sorry,
that's a catchprase from a UK TV programme)

1. A SAX filter can sometimes be used instead of an XSLT transformation,
and it can sometimes be used for pre-processing the input to an XSLT
transformation, or for post-processing the output.

2. The main cases where a SAX filter can be useful are:

   (a) in cases where the XML file is too large to be processed by XSLT
   (b) in cases where you need to perform operations - usually text
processing - that can't be done easily in XSLT.
   (c) to preserve information that the XSLT/XPath data model does not
retain

3. To solve problems of document size, you can:

   (a) do all the processing in a SAX application (if the processsing is
simple and purely serial)
   (b) use a preprocessing SAX filter to create a smaller input document
for the transformation to work with (e.g. by projection or restriction)
   (c) use a preprocessing SAX filter to split the large document into
many small documents, each of which is then transformed independently by
XSLT. If necessary, you can then use a postprocessing SAX filter to put
the transformed pieces back together again.

4. A SAX filter can be used to transform the input data into a form that
is more amenable to XSLT processing. Examples include:

   (a) preparsing a structured text field (e.g. CSV) into a set of
separate elements
   (b) changing the representation of a date field to the ISO 8601 form
yyyy-mm-dd
   (c) computing a derived attribute, e.g. adding @value as the product
of @price and @qty, making it easier for the XSLT stylesheet to do
sorting and totalling.
   (d) simple grouping of elements, for example adding a <list> element
around any consecutive sequence of one or more <list-item> elements

5. A SAX filter can be used to capture features of the source document
that are not representable in the XSLT data model. For example, entity
references and CDATA sections, as well as DTD declarations, can all be
captured in a SAX filter and translated into elements that are visible
to the XSLT stylesheet.

6. A postprocessing SAX filter (or simply a SAX ContentHandler) is
useful in two principal situations: 

   (a) to undo the changes made by a preprocessing filter
   (b) to achieve serialization effects that cannot be achieved using
the standard serialization methods (as an alternative to
disable-output-escaping).

Sometimes a user-written serializer can be produced by subclassing the
standard serializer supplied with your chosen product. This will of
course be product-dependent and your code may not work with future
releases of the product.

7. It's also possible to write a SAX filter to preprocess the
stylesheet. This is less common, but it can be used to tackle problems
such as dynamic sort keys, or XPath expressions that are contained
within source documents.

The new STX specification provides the prospect of being able to write
SAX filters without needing to do low-level Java coding. If this takes
off, I think that the idea of doing a complex transformation as a
pipeline of SAX filters, some generated using XSLT and some using STX,
may become increasingly attractive. Although XSLT 2.0 deals with nearly
all the limitations of XSLT 1.0 in areas such as text processing,
grouping, and aggregation, it doesn't address the problem of handling
large input documents.

Michael Kay
Software AG
home: Michael(_dot_)H(_dot_)Kay(_at_)ntlworld(_dot_)com
work: Michael(_dot_)Kay(_at_)softwareag(_dot_)com 






 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list