xsl-list
[Top] [All Lists]

Re: [xsl] Using sibling value in streaming mode

2019-08-31 03:25:40
I think Martin has provided several options quite well, but perhaps another 
angle will also be helpful.

If the maps are reasonably small, then the simplest approach is "burst-mode" or 
"windowed" streaming: In the template rule with match="map", bind a variable to 
select="copy-of(.)", and then process the tree contained in that variable in 
normal unstreamed fashion.

If you want to achieve some level of streaming within the map, then clearly 
it's not going to be perfect streaming; in the worst case, if the "id" comes 
last, then you're going to have to buffer something in memory. Burst-mode 
streaming buffers the input in memory; an alternative is to buffer the output, 
which you can achieve using xsl:fork:

<xsl:template match="map" mode="streamed">
   <xsl:fork>
     <xsl:sequence>
        <id>{string[@key='id']}</id>
     <xsl:sequence>
     <xsl:sequence>
        <xsl:apply-templates select="string[not(@key='id')]" mode="streamed"/>
     <xsl:sequence>
   </xsl:fork>
</xsl:template>

If the maps are too large for that to be viable, then you could go for a 
two-pass solution, In the first streamed pass over the input document, 
construct an in-memory XDM map from position to id. In the second streamed 
pass, as each <map> element is encountered, output the id obtained from this 
XDM map, and then process all the children of the map (skipping the id) in 
streamed mode.

Another possibility that occurred to me is a self-merge. Use xsl:merge to merge 
the file with itself, using the <map> element's position() as the merge key (if 
that's possible); then extract the id from one of the merge inputs, and the 
other values from the other. But that still requires memory proportional to the 
largest map, because Saxon is going to hold the merge groups in memory (the 
semantics require an implicit call on snapshot()). 

Michael Kay
Saxonica

On 30 Aug 2019, at 22:18, Martynas Jusevičius 
martynas(_at_)atomgraph(_dot_)com 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

Hi,

I've started looking into streaming recently (using Saxon 9.9). I have
a use case like this:

Input:

<array>
   <map>
      <string key="key1">value1</string>
      ...
      <string key="id">123456789</string>
      ...
      <string key="keyN">valueN</string>
   </map>
   ...
</array>

Required output:

<items>
   <item>
      <id>123456789</id>
      <key>key1<key>
      <val>value1</val>
   </item>
   ...
   <item>
      <id>123456789</id>
      <key>id<key>
      <val>123456789</val>
   </item>
   ...
   <item>
      <id>123456789</id>
      <key>keyN<key>
      <val>valueN</val>
   </item>
   ...
</items>

The value of <string key="id"> is used as <id> in <item> elements. The
problem is that <string key="id"> can occur in any position in the
<map>.

I've tried using an accumulator such as

<xsl:accumulator name="map-id" initial-value="()" streamable="yes"
as="xs:string?">
  <xsl:accumulator-rule match="/array/map/string[@key = 'id']/text()"
select="string(.)"/>
</xsl:accumulator>

and then

<item>
   <id><xsl:value-of select="accumulator-before('map-id')"/></id>
   ...
</item>

That worked partially -- only for sibling <string> elements that
followed the <string key="id">. Which is not surprising.

I've also tried accumulator-after('map-id') but got:

 XTSE3430: Template rule is not streamable
 * A call to accumulator-after() is consuming when there are no
preceding consuming instructions

Is it possible to have a streaming solution in this case?

Martynas

--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--
<Prev in Thread] Current Thread [Next in Thread>