xsl-list
[Top] [All Lists]

Re: [xsl] Question on streaming and grouping with nested keys

2017-07-14 09:13:29
On 14.07.2017 15:02, Felix Sasaki felix(_at_)sasakiatcf(_dot_)com wrote:


2017-07-14 14:41 GMT+02:00 Martin Honnen martin(_dot_)honnen(_at_)gmx(_dot_)de <mailto:martin(_dot_)honnen(_at_)gmx(_dot_)de> <xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com <mailto:xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com>>:

    On 14.07.2017 14:05, Felix Sasaki felix(_at_)sasakiatcf(_dot_)com
    <mailto:felix(_at_)sasakiatcf(_dot_)com> wrote:

        I tried the example from Martin with

        <xsl:template match="TRANSACTION-LIST">
               <xsl:copy>
                  <xsl:for-each-group select="copy-of(TRANSACTION)"
        group-by="ITEM2/SUBITEM2/GROUPING-KEY">
                     <xsl:copy>
                        <item1-sum><xsl:value-of
        select="sum(current-group()/ITEM2/SUBITEM2.1)"/></item1-count>

        ...

        It gives me an of memory error. The input file is 160MB, but the
        individual transactions are rather small (around 20+ elements).
        The error also appears if I remove "<xsl:copy>".


    160 MB doesn't sound like a file you need streaming for at all. Does
    that suggestion above cause memory problems only when using
    streaming (e.g. when you have <xsl:mode streamable="yes"/>) or also
without streaming?


Without streaming it works.

That sounds odd.



Thanks. Working without accumulators is fine, just trying to understand the issue. Other input files are a bit bigger, up to 1.5 GB, so having a streaming solution would be nice but it's not mandatory.

I have now tried to solve it with streaming accumulators, using

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
    xmlns:xs="http://www.w3.org/2001/XMLSchema";
    xmlns:math="http://www.w3.org/2005/xpath-functions/math";
    xmlns:map="http://www.w3.org/2005/xpath-functions/map";
    exclude-result-prefixes="xs math map"
    expand-text="true"
    version="3.0">

<xsl:param name="STREAMABLE" as="xs:boolean" static="yes" select="true()"/>

<xsl:mode _streamable="{$STREAMABLE}" on-no-match="shallow-skip" use-accumulators="item1-count subitem groups"/>

    <xsl:output indent="yes"/>

<xsl:accumulator name="item1-count" as="xs:integer" initial-value="0" _streamable="{$STREAMABLE}">
        <xsl:accumulator-rule match="TRANSACTION" select="0"/>
<xsl:accumulator-rule match="TRANSACTION/ITEM1" select="$value + 1"/>
    </xsl:accumulator>

<xsl:accumulator name="subitem" as="xs:integer" initial-value="0" _streamable="{$STREAMABLE}"> <xsl:accumulator-rule match="TRANSACTION/ITEM2/SUBITEM2.1/text()" select="xs:integer(.)"/>
    </xsl:accumulator>

<xsl:accumulator name="groups" as="map(xs:string, map(xs:string, xs:integer))" initial-value="map{}" _streamable="{$STREAMABLE}"> <xsl:accumulator-rule match="TRANSACTION/ITEM2/SUBITEM2.2/GROUPING-KEY/text()"
            select="let $key := string(),
                        $count := accumulator-before('item1-count'),
                        $sum := accumulator-before('subitem')
                    return if (not(map:contains($value, $key)))
then map:put($value, $key, map { 'count' : $count, 'sum' : $sum })
                           else let $value-map := $value($key)
return map:put($value, $key, map { 'count' : $count + $value-map?count, 'sum' : $sum + $value-map?sum })"/>
    </xsl:accumulator>

    <xsl:template match="TRANSACTION-LIST">
        <xsl:copy>
            <xsl:apply-templates/>
<xsl:variable name="groups" select="accumulator-after('groups')"/>
            <xsl:for-each select="map:keys($groups)">
                <transaction key="{.}">
                    <count>{$groups(.)?count}</count>
                    <amount>{$groups(.)?sum}</amount>
                </transaction>
            </xsl:for-each>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

I had thought, that, when matching on a text() node, it is possible to consume its value and Saxon does not complain about the accumulator

<xsl:accumulator name="subitem" as="xs:integer" initial-value="0" _streamable="{$STREAMABLE}"> <xsl:accumulator-rule match="TRANSACTION/ITEM2/SUBITEM2.1/text()" select="xs:integer(.)"/>
    </xsl:accumulator>

However, for the more complex one


<xsl:accumulator name="groups" as="map(xs:string, map(xs:string, xs:integer))" initial-value="map{}" _streamable="{$STREAMABLE}"> <xsl:accumulator-rule match="TRANSACTION/ITEM2/SUBITEM2.2/GROUPING-KEY/text()"
            select="let $key := string(),
                        $count := accumulator-before('item1-count'),
                        $sum := accumulator-before('subitem')
                    return if (not(map:contains($value, $key)))
then map:put($value, $key, map { 'count' : $count, 'sum' : $sum })
                           else let $value-map := $value($key)
return map:put($value, $key, map { 'count' : $count + $value-map?count, 'sum' : $sum + $value-map?sum })"/>
    </xsl:accumulator>

it continues to complain with

Static error at xsl:accumulator-rule on line 33 column 136 of count-sum-accum1.xsl: XTSE3430: The xsl:accumulator-rule/@select expression (or contained sequence constructor)
  for a streaming accumulator must be motionless

As I have no other implementation to test (the Feb 2016 build of Exselt is too old to support the XSLT 3.0 final spec syntax details) I can't tell whether Saxon is right and I am afraid I still get lost when doing streamability analysis by hand.

When I disable streaming then the code seems to give the right result on some simplified test data

<?xml version="1.0" encoding="UTF-8"?>
<TRANSACTION-LIST>
    <TRANSACTION>
        <ITEM1>1</ITEM1>
        <ITEM2>
            <SUBITEM2.1>10</SUBITEM2.1>
            <SUBITEM2.2>
                <GROUPING-KEY>a</GROUPING-KEY>
            </SUBITEM2.2>
        </ITEM2>
    </TRANSACTION>
    <TRANSACTION>
        <ITEM1>1</ITEM1>
        <ITEM2>
            <SUBITEM2.1>10</SUBITEM2.1>
            <SUBITEM2.2>
                <GROUPING-KEY>b</GROUPING-KEY>
            </SUBITEM2.2>
        </ITEM2>
    </TRANSACTION>
    <TRANSACTION>
        <ITEM1>1</ITEM1>
        <ITEM2>
            <SUBITEM2.1>10</SUBITEM2.1>
            <SUBITEM2.2>
                <GROUPING-KEY>c</GROUPING-KEY>
            </SUBITEM2.2>
        </ITEM2>
    </TRANSACTION>
    <TRANSACTION>
        <ITEM1>1</ITEM1>
        <ITEM2>
            <SUBITEM2.1>10</SUBITEM2.1>
            <SUBITEM2.2>
                <GROUPING-KEY>a</GROUPING-KEY>
            </SUBITEM2.2>
        </ITEM2>
    </TRANSACTION>
    <TRANSACTION>
        <ITEM1>1</ITEM1>
        <ITEM2>
            <SUBITEM2.1>10</SUBITEM2.1>
            <SUBITEM2.2>
                <GROUPING-KEY>b</GROUPING-KEY>
            </SUBITEM2.2>
        </ITEM2>
    </TRANSACTION>
    <TRANSACTION>
        <ITEM1>1</ITEM1>
        <ITEM2>
            <SUBITEM2.1>10</SUBITEM2.1>
            <SUBITEM2.2>
                <GROUPING-KEY>c</GROUPING-KEY>
            </SUBITEM2.2>
        </ITEM2>
    </TRANSACTION>
</TRANSACTION-LIST>
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>