Re: [xsl] Streaming with XSLT version 3.0
2014-03-11 10:34:03
Hi all,
First of all, I want to thank you all for your opinions and feedback on
this thread.
Related to the OutOfMemory problem, this can happens if the
transformation result is very large and the user choose to see it in the
Results view. The Result view uses a simple text area which does not
support loading such a large content.
This option can be disabled by editing the associated transformation
scenario, open the 'Output' tab and unselect all the checkboxes from
'Show in results view as' section.
More details about how to configure the transformation scenario output
can be found here:
http://oxygenxml.com/doc/ug-editor/#topics/the-output-tab.html#the-output-tab
I will add a feature request in our issue tracking system to improve the
handling of this situation.
After I disabled displaying the transformation output in the Results
view, I tried to transform a 3 GB file that has a similar structure with
the one posted by Terry. In this case I found another problem: the
execution time from oXygen is 6 times slower than running the
transformation in the command line.
This happens because the Saxon-EE schema-based validation (-val:lax)
feature is active by default when running a transformation with the
Saxon-EE processor.
The main feature in the first Saxon-EE versions was the schema-aware
validation (-sa switch). So, we assumed that the user choose to run with
Saxon-EE because he wants schema aware validation.
Meanwhile, the list with features available in Saxon-EE has grown and
now there are a lot more reasons to use Saxon-EE.
I will add an issue in our bug tracking system to reconsider the default
for this option (-val).
To disable 'schema-aware validation' option you have to edit the
associated scenario and press the 'Advanced Options' button located next
to the Saxon-EE processor combo. The 'Advanced Options' button displays
a dialog that allows you to customize Saxon-EE processor. In this dialog
you have to choose 'Disable schema validation' for 'Validation on source
file (-val)' option.
In conclusion, without showing the output in the result view and by
disabling the schema-aware validation you will get the same execution
time when running the transformation from oXygen and from the command line.
Regards,
Radu
--
Radu Pisoi
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
On 3/8/2014 23:14, Terry Badger wrote:
MIchael,
I did run the process successfully. See my notes here. I have reported it to
Oxygen.
Details for running a large file with xslt v3 streaming
==========
Large source file is found here:
http://dumps.wikimedia.org/enwiki/20130403/enwiki-20130403-pages-articles-multistream.xml.bz2
==========
Here is the result of Saxon running for a DOS shell with a respectable 21
minutes and no out-of-memory report
C:\Temp\wiki>C:\Progra~2\Java\jre7\bin\java -Xmx180m -Xss4096k -Xms48m -cp
C:/saxon/saxon9ee.jar; net.sf.saxon.Transform -TJ -t -it:main
-o:C:/Temp/wiki/out/wiki-03-output.xml C:/Temp/wiki/xsl/wiki-03.xsl
Saxon-EE 9.5.1.4J from Saxonica
Java version 1.7.0_45
Using license serial number V001638
Generating byte code...
Stylesheet compilation time: 476 milliseconds
Processing (no source document) initial template = main
URIResolver.resolve href="../source/enwiki.xml"
base="file:/C:/Temp/wiki/xsl/wiki-03.xsl"
Using parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser
Writing to file:/C:/Temp/wiki/out/output-wiki-03.xml
Execution time: 21m 24.612s (1284612ms)
Memory used: 25491272
NamePool contents: 28 entries in 27 chains. 7 URIs
==========
With this xsl stylesheet
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="http://www.mediawiki.org/xml/export-0.8/"
xpath-default-namespace="http://www.mediawiki.org/xml/export-0.8/"
exclude-result-prefixes="#all"
version="3.0">
<xsl:output method="xml"/>
<xsl:variable name="root" select="/"/>
<xsl:mode streamable="yes"/>
<xsl:template name="main">
<xsl:stream href="../source/enwiki.xml">
<xsl:result-document href="../out/output-wiki-03.xml">
<count>
<xsl:iterate select="mediawiki/page">
<xsl:param name="count" select="0" as="xs:decimal"/>
<xsl:next-iteration>
<xsl:with-param name="count" select="$count+1"/>
</xsl:next-iteration>
<xsl:on-completion>
<xsl:value-of select="$count"/>
</xsl:on-completion>
</xsl:iterate>
</count>
</xsl:result-document>
</xsl:stream>
</xsl:template>
</xsl:stylesheet>
============
With this result file
<?xml version="1.0" encoding="UTF-8"?>
<count xmlns="http://www.mediawiki.org/xml/export-0.8/%22%3E13355093%3C/count>
============
While running in Oxygen 15.2 with Saxon 9.5.1.3 with same source and stylesheet
file after about an hour we had an out of memory error. I have reported it to
Oxygen.
On Saturday, March 8, 2014 5:43 AM, Michael Kay <mike(_at_)saxonica(_dot_)com>
wrote:
Could you try it outside oXygen? You can get a 30-day free Saxon-EE evaluation
license to enable this. That will establish whether the problem is primarily a
Saxon one or an oXygen one, which will make it a lot easier to help you.
Michael Kay
Saxonica
On 7 Mar 2014, at 23:10, Terry Badger <terry_badger(_at_)yahoo(_dot_)com> wrote:
David,
Thank you. I tried your suggestion but it still failed with an out-of-memory
report.
Terry
On Friday, March 7, 2014 9:10 AM, David Rudel <fwqhgads(_at_)gmail(_dot_)com>
wrote:
Terry,
You can address the possibility that oXygen is simply choking on the
output by wrapping your output in <xsl:result-document> instructions.
If you pipe output to a file, oXygen does not attempt to display it in
the application when the scenario completes. This would eliminate at
least one possible reason for the crash without requiring you to run
from the command line.
-David
On Fri, Mar 7, 2014 at 1:09 AM, Abel Braaksma (Exselt)
<abel(_at_)exselt(_dot_)net> wrote:
It is also important to try to find out what is actually causing the
memory exception. If you run it from oXygen like you say, it is very
well possible that the exception comes from oXygen itself, not capable
of handling the output file. This would explain the late memory
exception. To find this out, simply run it from the command line, and
what what happens to memory in task manager.
--
"A false conclusion, once arrived at and widely accepted is not
dislodged easily, and the less it is understood, the more tenaciously
it is held." - Cantor's Law of Preservation of Ignorance.
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--
Regards,
Radu
Radu Pisoi
<oXygen/> XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
|
|