xsl-list
[Top] [All Lists]

RE: [xsl] String conversion problem when string is large

2012-03-21 13:32:54
-----Original Message-----
From: Michael Kay [mailto:mike(_at_)saxonica(_dot_)com]
Sent: Tuesday, March 20, 2012 3:50 PM
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] String conversion problem when string is large

Try changing this:

           <xsl:with-param name="HexData">
             <xsl:value-of select="substring-after($HexData, ',')" />
           </xsl:with-param>

to this:

           <xsl:with-param name="HexData"
select="substring-after($HexData, ',')" />


Passing the parameter as a string will be MUCH more efficient
than passing it as a TinyTree.

Even better, though probably not necessary, would be to pass
the original unchanged string plus an offset.

Michael Kay
Saxonica

This problem is proving to be quite educational or insightful.

So far, of five engines I have (xsltproc, sablotron, AltovaXML.exe, msxsl.exe, 
and saxonhe.jar), only saxonhe.jar works with this data set and the recursive 
implementations I have tried.  It would appear that for this conversion to work 
in most engines, it must be implemented a completely different way.

The following implementation is an attempt to use suggestions to use an index.  
Count is passed in to avoid a repeated and expensive computation of the 
terminating case.

  <xsl:call-template name="HexToDec">
    <xsl:with-param name="HexData" select="." />
    <xsl:with-param name="Count" select="@Count" />
  </xsl:call-template>

  <!-- =======================================================================
   !
   ! Convert data in the form "0xhh,0xhh,...", to a comma-separated list of
   ! decimal numbers in the form "n,n,...".  Count is the number of items
   ! in HexData.  Index is the item currently being converted.
   !
   ! --><xsl:template name="HexToDec">
  <xsl:param name="HexData" />
  <xsl:param name="Count" />
  <xsl:param name="Index" select="0" />
  <xsl:if test="$Index &lt; $Count">
    <xsl:variable name="Hex" select="'0123456789ABCDEF'" />
    <xsl:text>,</xsl:text>
    <xsl:value-of
      select="string-length(
                substring-before(
                  $Hex, substring($HexData, $Index * 5 + 3, 1))) * 16 +
              string-length(
                substring-before(
                   $Hex,substring($HexData, $Index * 5 + 4, 1)))"
      />
    <xsl:call-template name="HexToDec">
      <xsl:with-param name="Count"   select="$Count" />
      <xsl:with-param name="HexData" select="$HexData" />
      <xsl:with-param name="Index"   select="$Index + 1" />
    </xsl:call-template>
  </xsl:if>
</xsl:template>

It appears that simply using a plain variable reference in the parameter 
constitutes a pass by reference, and that part of the original implementation 
problem was that the sub-string operations were allocating new copies of the 
string in memory at each level of the recursion.

This implementation halves Saxon's memory usage at the cost of increasing 
execution time though the last example below can almost half it again.

  W/Index & Count
  Execution time: 1m 41.223s (101223ms)
  Memory used: 57863448

vs.

  Original after collapsing the xsl:with-param to use select attribute:
  Execution time: 854ms
  Memory used: 110204152

vs.

  Like W/Index & Count except that Count is computed.
  Execution time: 11m 9.027s (669027ms)
  Memory used: 32725384

vs.

  Execution time: 1m 40.987s (100987ms)
  Memory used: 60866736

  <xsl:call-template name="HexToDec">
    <xsl:with-param name="HexData" select="." />
    <xsl:with-param name="Count" select="@Count" />
  </xsl:call-template>

  <xsl:template name="HexToDec">
    <xsl:param name="HexData" />
    <xsl:param name="Count" />
    <xsl:param name="Index" select="0" />
    <xsl:param name="Hex" select="'0123456789ABCDEF'" />
    <xsl:if test="$Index &lt; $Count">
      <xsl:text>,</xsl:text>
      <xsl:value-of
        select="string-length(
                  substring-before(
                    $Hex, substring($HexData, $Index * 5 + 3, 1))) * 16 +
                string-length(
                  substring-before(
                     $Hex,substring($HexData, $Index * 5 + 4, 1)))"
        />
      <xsl:call-template name="HexToDec">
        <xsl:with-param name="Count"   select="$Count" />
        <xsl:with-param name="HexData" select="$HexData" />
        <xsl:with-param name="Index"   select="$Index + 1" />
      </xsl:call-template>
    </xsl:if>
  </xsl:template>

vs.

  Execution time: 1m 40.994s (100994ms)
  Memory used: 29581720

  <xsl:call-template name="HexToDec">
    <xsl:with-param  name="HexData" select="." />
    <xsl:with-param  name="Count" select="@Count" />
    <xsl:with-param  name="Hex" select="'0123456789ABCDEF'" />
  </xsl:call-template>

  <xsl:template name="HexToDec">
    <xsl:param name="HexData" />
    <xsl:param name="Count" />
    <xsl:param name="Index" select="0" />
    <xsl:param name="Hex" />
    <xsl:if test="$Index &lt; $Count">
      <xsl:text>,</xsl:text>
      <xsl:value-of
        select="string-length(
                  substring-before(
                    $Hex, substring($HexData, $Index * 5 + 3, 1))) * 16 +
                string-length(
                  substring-before(
                     $Hex,substring($HexData, $Index * 5 + 4, 1)))"
        />
      <xsl:call-template name="HexToDec">
        <xsl:with-param name="HexData" select="$HexData" />
        <xsl:with-param name="Index"   select="$Index + 1" />
        <xsl:with-param name="Count"   select="$Count" />
        <xsl:with-param name="Hex"     select="$Hex" />
      </xsl:call-template>
    </xsl:if>
  </xsl:template>

I had no idea what all was going on under the hood.

Based on memory usage, perhaps I am getting more of an idea how to pass by 
reference.

On 20/03/2012 19:58, Bulgrien, Kevin wrote:
-----Original Message-----
From: Bulgrien, Kevin [mailto:Kevin(_dot_)Bulgrien(_at_)GDSATCOM(_dot_)com]
Sent: Tuesday, March 20, 2012 2:06 PM
To: 'xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com'
Subject: RE: [xsl] String conversion problem when string is large

-----Original Message-----
From: Michael Kay [mailto:mike(_at_)saxonica(_dot_)com]
Sent: Tuesday, March 20, 2012 1:39 PM
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] String conversion problem when string is large

The simplest solution is to just find a different XSLT
processor, one that implements tail recursion optimization.
Saxon, for example.

You could rewrite the code either to use XSLT 2.0 string
handling or to use divide-and-conquer recursion, but unless
there is something that ties you to your current XSLT
processor there is no need to change the code.

Michael Kay
Saxonica
-----

I didn't expect that answer... I guess that's encouraging.

I have tried the Java version of SaxonB 9-1-0-8j, but some
links appeared to be broken (or else something on my company
proxy choked) on the SourceForge relative to the most recent
.zip of SaxonHE9-4 so I didn't try it before today.  Since
your reply, I tried some creative Googling and turned up a
download link that works.  I'll give try SaxonHE9-4-0-3J.zip a try.

-----

Well, I tried SaxonHE9-4 and got:

$ java -Xms1g -Xmx2g -jar ~/bin/saxon9he.jar -t
-s:develop/idiffout.xml -xsl:idiffout.xsl -o:idiffout.csv Saxon-HE
9.4.0.3J from Saxonica Java version 1.6.0_22
Warning: at xsl:stylesheet on line 2 column 80 of idiffout.xsl:
   Running an XSLT 1 stylesheet with an XSLT 2 processor Stylesheet
compilation time: 437 milliseconds Processing
file:/home/kbulgrien/cvs/r8000/update/IDiff2DUA/develop/idiffout.xml
Using parser
com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser
Building tree for

file:/home/kbulgrien/cvs/r8000/update/IDiff2DUA/develop/idiffout.xml
using class net.sf.saxon.tree.tiny.TinyBuilder
Tree built in 162 milliseconds
Tree size: 6616 nodes, 570130 characters, 10303 attributes
Exception
in thread "main" java.lang.OutOfMemoryError: Java heap space
         at
net.sf.saxon.tree.util.FastStringBuffer.condense(FastStringBuf
fer.java:485)
         at
net.sf.saxon.expr.instruct.DocumentInstr.evaluateItem(Document
Instr.java:308)
         at
net.sf.saxon.expr.parser.ExpressionTool.evaluate(ExpressionToo
l.java:320)
         at
net.sf.saxon.expr.instruct.GeneralVariable.getSelectValue(Gene
ralVariable.java:529)
         at
net.sf.saxon.expr.instruct.Instruction.assembleParams(Instruct
ion.java:187)
         at
net.sf.saxon.expr.instruct.CallTemplate.processLeavingTail(Cal
lTemplate.java:369)
         at
net.sf.saxon.expr.instruct.Block.processLeavingTail(Block.java:615)
         at
net.sf.saxon.expr.instruct.Choose.processLeavingTail(Choose.java:794)
         at
net.sf.saxon.expr.instruct.Block.processLeavingTail(Block.java:615)
         at
net.sf.saxon.expr.instruct.Template.expand(Template.java:231)
         at
net.sf.saxon.expr.instruct.CallTemplate$CallTemplatePackage.pr
ocessLeavingTail(CallTemplate.java:526)
         at
net.sf.saxon.expr.instruct.ApplyTemplates.apply(ApplyTemplates
.java:239)
         at
net.sf.saxon.expr.instruct.ApplyTemplates.processLeavingTail(A
pplyTemplates.java:199)
         at
net.sf.saxon.expr.instruct.Block.processLeavingTail(Block.java:615)
         at
net.sf.saxon.expr.instruct.Choose.processLeavingTail(Choose.java:794)
         at
net.sf.saxon.expr.instruct.Block.processLeavingTail(Block.java:615)
         at
net.sf.saxon.expr.instruct.Template.applyLeavingTail(Template.
java:212)
         at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1034)
         at
net.sf.saxon.expr.instruct.ApplyTemplates$ApplyTemplatesPackag
e.processLeavingTail(ApplyTemplates.java:476)
         at
net.sf.saxon.expr.instruct.ApplyTemplates.apply(ApplyTemplates
.java:239)
         at
net.sf.saxon.expr.instruct.ApplyTemplates.processLeavingTail(A
pplyTemplates.java:199)
         at
net.sf.saxon.expr.instruct.Block.processLeavingTail(Block.java:615)
         at
net.sf.saxon.expr.instruct.Template.applyLeavingTail(Template.
java:212)
         at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1034)
         at
net.sf.saxon.expr.instruct.ApplyTemplates.apply(ApplyTemplates
.java:237)
         at
net.sf.saxon.expr.instruct.ApplyTemplates.processLeavingTail(A
pplyTemplates.java:199)
         at
net.sf.saxon.expr.instruct.Choose.processLeavingTail(Choose.java:794)
         at
net.sf.saxon.expr.instruct.Block.processLeavingTail(Block.java:615)
         at
net.sf.saxon.expr.instruct.Template.applyLeavingTail(Template.
java:212)
         at net.sf.saxon.trans.Mode.applyTemplates(Mode.java:1034)
         at
net.sf.saxon.expr.instruct.ApplyTemplates.apply(ApplyTemplates
.java:237)
         at

net.sf.saxon.expr.instruct.ApplyTemplates.processLeavingTail(ApplyTemp
lates.java:199)

$ tail -1 idiffout.csv | awk 'BEGIN { FS=","; } { print NF
" vs " $9 "
}' -
3954 vs 53392

I don't know if there is a better way to invoke the
processor or not, nor if I should try the .NET version instead.
I suppose it is possible that something else in the overall
transform is to blame, but the transform exploded in the same spot.

Kevin Bulgrien


This message and/or attachments may include information
subject to GD Corporate Policy 07-105 and is intended to be
accessed only by authorized personnel of General Dynamics and
approved service providers.  Use, storage and transmission
are governed by General Dynamics and its policies.
Contractual restrictions apply to third parties.  Recipients
should refer to the policies or contract to determine proper
handling.  Unauthorized review, use, disclosure or
distribution is prohibited.  If you are not an intended
recipient, please contact the sender and destroy all copies
of the original message.


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or 
e-mail:<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



This message and/or attachments may include information subject to GD Corporate 
Policy 07-105 and is intended to be accessed only by authorized personnel of 
General Dynamics and approved service providers.  Use, storage and transmission 
are governed by General Dynamics and its policies. Contractual restrictions 
apply to third parties.  Recipients should refer to the policies or contract to 
determine proper handling.  Unauthorized review, use, disclosure or 
distribution is prohibited.  If you are not an intended recipient, please 
contact the sender and destroy all copies of the original message.

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--