xsl-list
[Top] [All Lists]

Re: [xsl] Transforming large XML documents with XSLT 1.0

2019-04-09 04:25:05
Hi Martin,

On Tue, Apr 9, 2019 at 12:32 PM Martin Honnen 
martin(_dot_)honnen(_at_)gmx(_dot_)de <
xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

Your subject and introduction seem to talk about a general approach to
use XSLT 1 with very large documents but it seems the code you have in
the repository assumes a certain XML document structure where your
trigger elements as children of the root element are the only nodes the
XSLT needs to transform, and all of them in an isolated way where the
template for them does nothing but deal with the particular element and
does not navigate to ancestors or siblings.


The code I've proposed to transform large XML documents, is not very
generic as the XSLT language itself. I think, we can handle different XML
vocabularies with this approach, by writing some XML transformation code in
the java code & some in the stylesheets (we need to combine serialization
in java along with serialization by the XSLT processor). I've currently
explored transformation via this approach, for XML vocabularies like
following,

<root>
    <element>
       ...
    </element>
    <element>
       ...
    </element>
    ...
</root>

The number of <element> nodes may be very large. As the StAX parser finds
the node <element>, the XSLT transformer will start transforming this
element, and serialization will occur to the output file. Transformation of
each <element> node writes the output to a common file, which is always
opened in append mode.

Another use case I've worked with is, splitting a large XML document using
StAX parser along with transformation APIs.


wouldn't the same approach work using SAX?


I think, feeding SAX events from an XML input document to a XSLT
transformer cannot scale as the StAX parsing can do. SAX is a push API (the
parser will continuously push SAX events to an application), while StAX is
pull API (the application asks for the next event when it has processed an
earlier event).


Also writing out the XML result's XML declaration and root element as
bytes to a stream seems awkward, isn't there a way to chain your Stax
StreamReader to a StreamWriter to simply write out XML with a dedicated
API ensuring well-formedness and encoding?


This seems a nice idea. I'll try to explore it.


In the .NET world you can do that with XmlReader/XmlWriter.


Ok. I'll try to explore that as well.





-- 
Regards,
Mukul Gandhi
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--
<Prev in Thread] Current Thread [Next in Thread>