xsl-list
[Top] [All Lists]

[xsl] "Heap" of trouble handling input file of 500 MByte

2011-02-19 13:47:32

Hello,

Thanks mainly to this list, I am successfully processing 6,335 of my 6,337 
input files. The 6,335 are under 250 MByte each. The two problem cases are each 
just under 500 MByte. 

Are there any tips or tricks or tools which will make this possible on my 
32-bit Windows XP SP3 machine? 

I am using Java code and the Javax.xml.* classes to do the transform. The main 
piece of Executor.java is:

// Prepare the transformer factory
javax.xml.transform.TransformerFactory transFact = 
javax.xml.transform.TransformerFactory.newInstance();
// Prepare an xsl source for the transformer
javax.xml.transform.Source xsltSource = new 
javax.xml.transform.stream.StreamSource( xslFile);
// Make the Transformer
javax.xml.transform.Transformer trans = transFact.newTransformer(xsltSource);
.....
//Prepare the transformation input and output
File xmlFileInput = new File(folder + xmlfilename);
File xmlFileOutput = new File(targetFolder
 + addedPrefix + xmlfilename);
javax.xml.transform.Source  xmlSource = new 
javax.xml.transform.stream.StreamSource(xmlFileInput);
javax.xml.transform.Result   xmlResult   = new 
javax.xml.transform.stream.StreamResult(xmlFileOutput);
// Do the transform
trans.transform(xmlSource, xmlResult);
                                                
The 6,335 process fine within Eclipse 3.61 with a VM of one GByte, using  
"-Xms1024m
 -Xmx1024m"  in the Run Configuration. 

The same Java class also runs fine (in 6,335 cases) from Windows XP SP3 command 
line: 
  java Executor myArguments  -Xms1024m -Xmx1024m

I have increased the VM at the command line, in steps, going as high as 4 GB, 
          java Executor myArguments  -Xms4096m -Xmx4096m

but regardless, the two big files provoke:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.util.Arrays.copyOf(Unknown Source)
        at java.util.Arrays.copyOf(Unknown Source)
        at java.util.Vector.ensureCapacityHelper(Unknown Source)
        at java.util.Vector.addElement(Unknown Source)
        at 
com.sun.org.apache.xml.internal.dtm.ref.sax2dtm.SAX2DTM2.startElement(Unknown 
Source)
        at 
com.sun.org.apache.xalan.internal.xsltc.dom.SAXImpl.startElement(Unknown Source)
        at 
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(Unknown
 Source)
        at 
com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown
 Source)
        at 
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown
 Source)
        at 
com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown 
Source)
        at 
com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown 
Source)
        at 
com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
 Source)
        at 
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown 
Source)
        at 
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown 
Source)
        at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown 
Source)
        at 
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown 
Source)
        at 
com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(Unknown 
Source)
        at 
com.sun.org.apache.xalan.internal.xsltc.dom.XSLTCDTMManager.getDTM(Unknown 
Source)
        at 
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.getDOM(Unknown 
Source)
        at 
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(Unknown 
Source)
        at 
com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(Unknown 
Source)
        at Executor.main(Executor.java:183)

What to try next?

Is it really a lack of heap space? I imagine that heap requirements are 
proportional to file size, but then a 3 GByte VM should work. 

Could the problem relate to the XSL code itself, which is very brief and does a 
sort-while-copying operation?

TIA!

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--