xsl-list
[Top] [All Lists]

Re: [xsl] How To Use Streaming To Group Elements in a Flat List?

2017-05-02 17:25:34
Running your code on Saxon 9.7, I get

  XTSE3430: Template rule is declared streamable but it does not satisfy the 
streamability rules. 
  * The xsl:for-each-group/@group-starting-with pattern is not motionless

That's because *[position()] involves counting preceding siblings. Or to look 
at it another way, the pattern can't be evaluated simply by looking at the node 
in isolation, it has to examine its position relative to other nodes in the 
document.

But there's an easy workaround: use group-adjacent="(position() - 1) idiv 
1000". With this formulation, position() is counting the items being grouped, 
not the number of siblings they have.

Here's the full stylesheet:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
    xmlns:xs="http://www.w3.org/2001/XMLSchema";
    exclude-result-prefixes="xs"
    version="3.0">

    <xsl:mode streamable="yes"/>

    <xsl:template match="ROWDATA">
        <xsl:variable name="resultURIbase" as="xs:string"
            select="concat('out', '/rowdata-')"
        />
        <xsl:variable name="rootname" as="xs:string" select="name(.)"/>
        
        <xsl:for-each-group select="ROW" group-adjacent="(position() - 1) idiv 
1000">
            <xsl:result-document href="{concat($resultURIbase, generate-id(), 
'.xml')}">
                <xsl:element name="{$rootname}">
                    <xsl:copy-of select="current-group()"/>
                </xsl:element>
            </xsl:result-document>
        </xsl:for-each-group>
        
    </xsl:template>
    
</xsl:stylesheet>


On 2 May 2017, at 21:55, Eliot Kimber ekimber(_at_)contrext(_dot_)com 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

I have some very large (100s of MBs) XML database dump docs that I want to 
break into smaller docs. This is an easy application of for-each-group or of 
a simple tail recursion approach but I wanted to use this as an opportunity 
to learn more about XSLT 3 streaming.

I’ve read through the XSLT 3 spec and I think I generally understand the 
options but it’s still not clear either how or how best to do this type of 
grouping so that it’s streamable. I didn’t find any examples of this specific 
use case searching on “xslt streaming with grouping” (other than older items 
that don’t actually work).

If my source looks like this:

<ROWDATA>
   
<ROW><SRVC_CAT_ID>54</SRVC_CAT_ID><PARENT_ID>3</PARENT_ID><SRVC_CAT_NAME>Exterior
 Lights</SRVC_CAT_NAME><PARENT_NAME>Accessories and Body, 
Cab</PARENT_NAME></ROW>
   
<ROW><SRVC_CAT_ID>53</SRVC_CAT_ID><PARENT_ID>3</PARENT_ID><SRVC_CAT_NAME>Exterior
 Body Panels</SRVC_CAT_NAME><PARENT_NAME>Accessories and Body, 
Cab</PARENT_NAME></ROW>
   
<ROW><SRVC_CAT_ID>51</SRVC_CAT_ID><PARENT_ID>3</PARENT_ID><SRVC_CAT_NAME>Entertainment
 Systems</SRVC_CAT_NAME><PARENT_NAME>Accessories and Body, 
Cab</PARENT_NAME></ROW>
   
<ROW><SRVC_CAT_ID>40</SRVC_CAT_ID><PARENT_ID>3</PARENT_ID><SRVC_CAT_NAME>Door 
Locks &amp; Anti-Theft Systems</SRVC_CAT_NAME><PARENT_NAME>Accessories and 
Body, Cab</PARENT_NAME></ROW>
… lots more rows …
</ROWDATA>

I’d like to generate result files containing 1000 records each, each wrapped 
in the same root element.

The non-stream for-each-group is simple enough:

   <xsl:template match="ROWDATA">
       <xsl:variable name="resultURIbase" as="xs:string"
           select="concat($outdir, '/rowdata-')"
       />
       <xsl:variable name="rootname" as="xs:string" select="name(.)"/>

       <xsl:for-each-group select="ROW" group-starting-with="*[(position() 
mod 1000) = 0]">
           <xsl:result-document href="{concat($resultURIbase, generate-id(), 
'.xml')}">
               <xsl:element name="{$rootname}">
                   <xsl:copy-of select="current-group()"/>
               </xsl:element>
           </xsl:result-document>
       </xsl:for-each-group>

   </xsl:template>


But I’m not seeing how do this using e.g., xsl:iterate. As is often the case 
with XSLT, I feel like I’m missing the obvious. 

Is it in fact possible to do what I want in a streamable way?

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com



--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>