On 14.07.2017 15:02, Felix Sasaki felix(_at_)sasakiatcf(_dot_)com wrote:
2017-07-14 14:41 GMT+02:00 Martin Honnen martin(_dot_)honnen(_at_)gmx(_dot_)de
<mailto:martin(_dot_)honnen(_at_)gmx(_dot_)de> <xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com
<mailto:xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com>>:
On 14.07.2017 14:05, Felix Sasaki felix(_at_)sasakiatcf(_dot_)com
<mailto:felix(_at_)sasakiatcf(_dot_)com> wrote:
I tried the example from Martin with
<xsl:template match="TRANSACTION-LIST">
<xsl:copy>
<xsl:for-each-group select="copy-of(TRANSACTION)"
group-by="ITEM2/SUBITEM2/GROUPING-KEY">
<xsl:copy>
<item1-sum><xsl:value-of
select="sum(current-group()/ITEM2/SUBITEM2.1)"/></item1-count>
...
It gives me an of memory error. The input file is 160MB, but the
individual transactions are rather small (around 20+ elements).
The error also appears if I remove "<xsl:copy>".
160 MB doesn't sound like a file you need streaming for at all. Does
that suggestion above cause memory problems only when using
streaming (e.g. when you have <xsl:mode streamable="yes"/>) or also
without streaming?
Without streaming it works.
That sounds odd.
Thanks. Working without accumulators is fine, just trying to understand
the issue. Other input files are a bit bigger, up to 1.5 GB, so having a
streaming solution would be nice but it's not mandatory.
I have now tried to solve it with streaming accumulators, using
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:math="http://www.w3.org/2005/xpath-functions/math"
xmlns:map="http://www.w3.org/2005/xpath-functions/map"
exclude-result-prefixes="xs math map"
expand-text="true"
version="3.0">
<xsl:param name="STREAMABLE" as="xs:boolean" static="yes"
select="true()"/>
<xsl:mode _streamable="{$STREAMABLE}" on-no-match="shallow-skip"
use-accumulators="item1-count subitem groups"/>
<xsl:output indent="yes"/>
<xsl:accumulator name="item1-count" as="xs:integer"
initial-value="0" _streamable="{$STREAMABLE}">
<xsl:accumulator-rule match="TRANSACTION" select="0"/>
<xsl:accumulator-rule match="TRANSACTION/ITEM1" select="$value
+ 1"/>
</xsl:accumulator>
<xsl:accumulator name="subitem" as="xs:integer" initial-value="0"
_streamable="{$STREAMABLE}">
<xsl:accumulator-rule
match="TRANSACTION/ITEM2/SUBITEM2.1/text()" select="xs:integer(.)"/>
</xsl:accumulator>
<xsl:accumulator name="groups" as="map(xs:string, map(xs:string,
xs:integer))" initial-value="map{}" _streamable="{$STREAMABLE}">
<xsl:accumulator-rule
match="TRANSACTION/ITEM2/SUBITEM2.2/GROUPING-KEY/text()"
select="let $key := string(),
$count := accumulator-before('item1-count'),
$sum := accumulator-before('subitem')
return if (not(map:contains($value, $key)))
then map:put($value, $key, map { 'count' :
$count, 'sum' : $sum })
else let $value-map := $value($key)
return map:put($value, $key, map { 'count' :
$count + $value-map?count, 'sum' : $sum + $value-map?sum })"/>
</xsl:accumulator>
<xsl:template match="TRANSACTION-LIST">
<xsl:copy>
<xsl:apply-templates/>
<xsl:variable name="groups"
select="accumulator-after('groups')"/>
<xsl:for-each select="map:keys($groups)">
<transaction key="{.}">
<count>{$groups(.)?count}</count>
<amount>{$groups(.)?sum}</amount>
</transaction>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
I had thought, that, when matching on a text() node, it is possible to
consume its value and Saxon does not complain about the accumulator
<xsl:accumulator name="subitem" as="xs:integer" initial-value="0"
_streamable="{$STREAMABLE}">
<xsl:accumulator-rule
match="TRANSACTION/ITEM2/SUBITEM2.1/text()" select="xs:integer(.)"/>
</xsl:accumulator>
However, for the more complex one
<xsl:accumulator name="groups" as="map(xs:string, map(xs:string,
xs:integer))" initial-value="map{}" _streamable="{$STREAMABLE}">
<xsl:accumulator-rule
match="TRANSACTION/ITEM2/SUBITEM2.2/GROUPING-KEY/text()"
select="let $key := string(),
$count := accumulator-before('item1-count'),
$sum := accumulator-before('subitem')
return if (not(map:contains($value, $key)))
then map:put($value, $key, map { 'count' :
$count, 'sum' : $sum })
else let $value-map := $value($key)
return map:put($value, $key, map { 'count' :
$count + $value-map?count, 'sum' : $sum + $value-map?sum })"/>
</xsl:accumulator>
it continues to complain with
Static error at xsl:accumulator-rule on line 33 column 136 of
count-sum-accum1.xsl:
XTSE3430: The xsl:accumulator-rule/@select expression (or contained
sequence constructor)
for a streaming accumulator must be motionless
As I have no other implementation to test (the Feb 2016 build of Exselt
is too old to support the XSLT 3.0 final spec syntax details) I can't
tell whether Saxon is right and I am afraid I still get lost when doing
streamability analysis by hand.
When I disable streaming then the code seems to give the right result on
some simplified test data
<?xml version="1.0" encoding="UTF-8"?>
<TRANSACTION-LIST>
<TRANSACTION>
<ITEM1>1</ITEM1>
<ITEM2>
<SUBITEM2.1>10</SUBITEM2.1>
<SUBITEM2.2>
<GROUPING-KEY>a</GROUPING-KEY>
</SUBITEM2.2>
</ITEM2>
</TRANSACTION>
<TRANSACTION>
<ITEM1>1</ITEM1>
<ITEM2>
<SUBITEM2.1>10</SUBITEM2.1>
<SUBITEM2.2>
<GROUPING-KEY>b</GROUPING-KEY>
</SUBITEM2.2>
</ITEM2>
</TRANSACTION>
<TRANSACTION>
<ITEM1>1</ITEM1>
<ITEM2>
<SUBITEM2.1>10</SUBITEM2.1>
<SUBITEM2.2>
<GROUPING-KEY>c</GROUPING-KEY>
</SUBITEM2.2>
</ITEM2>
</TRANSACTION>
<TRANSACTION>
<ITEM1>1</ITEM1>
<ITEM2>
<SUBITEM2.1>10</SUBITEM2.1>
<SUBITEM2.2>
<GROUPING-KEY>a</GROUPING-KEY>
</SUBITEM2.2>
</ITEM2>
</TRANSACTION>
<TRANSACTION>
<ITEM1>1</ITEM1>
<ITEM2>
<SUBITEM2.1>10</SUBITEM2.1>
<SUBITEM2.2>
<GROUPING-KEY>b</GROUPING-KEY>
</SUBITEM2.2>
</ITEM2>
</TRANSACTION>
<TRANSACTION>
<ITEM1>1</ITEM1>
<ITEM2>
<SUBITEM2.1>10</SUBITEM2.1>
<SUBITEM2.2>
<GROUPING-KEY>c</GROUPING-KEY>
</SUBITEM2.2>
</ITEM2>
</TRANSACTION>
</TRANSACTION-LIST>
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--