xsl-list
[Top] [All Lists]

Re: [xsl] Question on streaming and grouping with nested keys

2017-07-14 08:02:23
2017-07-14 14:41 GMT+02:00 Martin Honnen martin(_dot_)honnen(_at_)gmx(_dot_)de <
xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com>:

On 14.07.2017 14:05, Felix Sasaki felix(_at_)sasakiatcf(_dot_)com wrote:

I tried the example from Martin with

<xsl:template match="TRANSACTION-LIST">
      <xsl:copy>
         <xsl:for-each-group select="copy-of(TRANSACTION)"
group-by="ITEM2/SUBITEM2/GROUPING-KEY">
            <xsl:copy>
               <item1-sum><xsl:value-of select="sum(current-group()/IT
EM2/SUBITEM2.1)"/></item1-count>

...

It gives me an of memory error. The input file is 160MB, but the
individual transactions are rather small (around 20+ elements). The error
also appears if I remove "<xsl:copy>".


160 MB doesn't sound like a file you need streaming for at all. Does that
suggestion above cause memory problems only when using streaming (e.g. when
you have <xsl:mode streamable="yes"/>) or also without streaming?



Without streaming it works.



Have you tried increasing the memory for Saxon/Java?



No.



As you mention Saxon EE, let's hope Michael Kay comes across this thread
and can certainly tell you more on how to tackle that problem with his
product.

I have a working solution using an accumulator and maps, see below, but
here I did not manage to use streaming. If I set the accumulator to
 streamable="yes", Saxon EE tells me


"The xsl:accumulator-rule/@select expression for a streaming accumulator
must be motionless"


Although I am using xsl-copy() as in Martin's example.


  <xsl:accumulator name="gather-values" as="map(xs:anyAtomicType,
node())" initial-value="map{}">
     <xsl:accumulator-rule match="TRANSACTION">
       <xsl:variable name="current" select="copy-of()"/>


As far as I understand it, you can't use copy-of() in an accumulator you
want to be streamable. Working with streaming and accumulating values
requires a change of the usual coding habits with XSLT, I think, for
instance to capture the key you have with an accumulator and streaming you
would need to use e.g.
     <xsl:accumulator-rule 
match="TRANSACTION/ITEM2/SUBITEM2.2/GROUPING-KEY/text()"
select="string()"/>
as only on the text node you are able to read out that value while
streaming through the document.

So to try to solve that problem with accumulators and streaming I think
you need several of them, one counting ITEM1, one summing up
SUBITEM2.1/text(), the above for the key and then you need to combine them
to store the data together.



Thanks. Working without accumulators is fine, just trying to understand the
issue. Other input files are a bit bigger, up to 1.5 GB, so having a
streaming solution would be nice but it's not mandatory.

- Felix




--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--
<Prev in Thread] Current Thread [Next in Thread>