xsl-list
[Top] [All Lists]

Re: [xsl] Streaming with XSLT version 3.0

2014-03-08 15:17:58
MIchael,
I did run the process successfully. See my notes here. I have reported it to 
Oxygen.
Details for running a large file with xslt v3 streaming
==========
Large source file is found here: 
http://dumps.wikimedia.org/enwiki/20130403/enwiki-20130403-pages-articles-multistream.xml.bz2
==========
Here is the result of Saxon running for a DOS shell with a respectable 21 
minutes and no out-of-memory report
C:\Temp\wiki>C:\Progra~2\Java\jre7\bin\java -Xmx180m -Xss4096k -Xms48m -cp 
C:/saxon/saxon9ee.jar; net.sf.saxon.Transform -TJ -t -it:main  
-o:C:/Temp/wiki/out/wiki-03-output.xml C:/Temp/wiki/xsl/wiki-03.xsl 
Saxon-EE 9.5.1.4J from Saxonica
Java version 1.7.0_45
Using license serial number V001638
Generating byte code...
Stylesheet compilation time: 476 milliseconds
Processing  (no source document) initial template = main
URIResolver.resolve href="../source/enwiki.xml" 
base="file:/C:/Temp/wiki/xsl/wiki-03.xsl"
Using parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser
Writing to file:/C:/Temp/wiki/out/output-wiki-03.xml
Execution time: 21m 24.612s (1284612ms)
Memory used: 25491272
NamePool contents: 28 entries in 27 chains. 7 URIs
==========
With this xsl stylesheet
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; 
    xmlns:xs="http://www.w3.org/2001/XMLSchema";
    xmlns="http://www.mediawiki.org/xml/export-0.8/"; 
    xpath-default-namespace="http://www.mediawiki.org/xml/export-0.8/"; 
exclude-result-prefixes="#all"
    version="3.0">
    <xsl:output method="xml"/>
    <xsl:variable name="root" select="/"/>
    <xsl:mode streamable="yes"/>
    <xsl:template name="main">
        <xsl:stream href="../source/enwiki.xml">
            <xsl:result-document href="../out/output-wiki-03.xml">
                <count>
                    <xsl:iterate select="mediawiki/page">
                        <xsl:param name="count" select="0" as="xs:decimal"/>
                        <xsl:next-iteration>
                            <xsl:with-param name="count" select="$count+1"/>
                        </xsl:next-iteration>
                        <xsl:on-completion>
                            <xsl:value-of select="$count"/>
                        </xsl:on-completion>
                    </xsl:iterate>
                </count>
            </xsl:result-document>
        </xsl:stream>
    </xsl:template>
</xsl:stylesheet>
============
With this result file
<?xml version="1.0" encoding="UTF-8"?>
<count xmlns="http://www.mediawiki.org/xml/export-0.8/%22%3E13355093%3C/count>
============
While running in Oxygen 15.2 with Saxon 9.5.1.3 with same source and stylesheet 
file after about an hour we had an out of memory error. I have reported it to 
Oxygen.

 

On Saturday, March 8, 2014 5:43 AM, Michael Kay <mike(_at_)saxonica(_dot_)com> 
wrote:
Could you try it outside oXygen? You can get a 30-day free Saxon-EE evaluation 
license to enable this. That will establish whether the problem is primarily a 
Saxon one or an oXygen one, which will make it a lot easier to help you.

Michael Kay
Saxonica

On 7 Mar 2014, at 23:10, Terry Badger <terry_badger(_at_)yahoo(_dot_)com> wrote:

David,
Thank you. I tried your suggestion but it still failed with an out-of-memory 
report.
Terry


On Friday, March 7, 2014 9:10 AM, David Rudel <fwqhgads(_at_)gmail(_dot_)com> 
wrote:
Terry,
You can address the possibility that oXygen is simply choking on the
output by wrapping your output in <xsl:result-document> instructions.

If you pipe output to a file, oXygen does not attempt to display it in
the application when the scenario completes. This would eliminate at
least one possible reason for the crash without requiring you to run
from the command line.

-David

On Fri, Mar 7, 2014 at 1:09 AM, Abel Braaksma (Exselt) 
<abel(_at_)exselt(_dot_)net> wrote:

It is also important to try to find out what is actually causing the
memory exception. If you run it from oXygen like you say, it is very
well possible that the exception comes from oXygen itself, not capable
of handling the output file. This would explain the late memory
exception. To find this out, simply run it from the command line, and
what what happens to memory in task manager.


-- 

"A false conclusion, once arrived at and widely accepted is not
dislodged easily, and the less it is understood, the more tenaciously
it is held." - Cantor's Law of Preservation of Ignorance.


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>

--~-- 

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--