xsl-list
[Top] [All Lists]

Re: [xsl] XSLT3 - Streaming + Recursive File Output

2016-08-12 07:06:56
idiv as in integer division

On Aug 12, 2016 6:22 AM, "Mailing Lists Mail" <daktapaal(_at_)gmail(_dot_)com> 
wrote:

Dr. Kay.
Thank you for your explanation. This is my first ever streaming stylesheet
and your explanations are very educational to me. I have some questions.
In your point A, you said we can switch off the multi Threading in the
result document. How do we do that?
In point B, foreach , you typed idiv .. should it be div ? is it a typo or
is there a new operator called idiv

Point c. Changing initial unnamed template to streamable produced no
results. No files generated. Also in the examples given in the spec i did
not see any mode on the initial template

Thank you Michael for your insights .. i have learned a lot by asking the
question.

Dak

On Aug 11, 2016 7:13 PM, "Michael Kay mike(_at_)saxonica(_dot_)com" <
xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

(A) don't equate xsl:fork with multi-threading. In fact, the current
implementation of xsl:fork in Saxon is not multi-threaded
(xsl:result-document might be, but you can switch it off). (Saxon's
streamed processing uses a push model, which complicates many things, but
pushing parser events to multiple consumers doesn't require multitple
threads).

(B) I think your recursive named template can be replaced with a
streamable call on xsl:for-each-group, something like

<xsl:for-each-group select="*:species" group-adjacent="(position()-1)
idiv 1000">
  <xsl:result-document href="species{position()}.xml">
    <species><xsl:copy-of select="current-group()"/></species>
  </xsl:result-document>
</xsl:for-each-group>

Compared with your approach, this solution has the advantage of not
imposing an arbitrary limit on the number of elements to be processed.

(C) I would expect the initial unnamed mode should be streamable.

(D) In the latest XSLT 3.0 we've provided "streamable stylesheet
functions" - not yet implemented in Saxon - but we stopped short at
streamable named templates. But you couldn't do this kind of batching using
streamable stylesheet functions either. A human reader can see in your code
that the Nth recursive call of the template is always processing nodes that
are later in document order than the (N-1)th recursive call, but it would
require a phenomenal amount of analysis for a theorem-prover to establish
that during static analysis, and even if you could prove it streamable,
generating a streamable execution plan would be far from trivial.

Michael Kay
Saxonica


On 11 Aug 2016, at 23:07, Mailing Lists Mail 
daktapaal(_at_)gmail(_dot_)com <
xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

Dear All,
I have the following problem to solve using XSLT3 Streaming , which I
have been trying for some time now and i find a road block no matter
which way I choose. Seems to be an interesting issue to solve, which
when resolved, will be a very good learning for me.

I have a HUGE XML ( obviously a starting point for XSlt3 Streaming)

I am using : SaxonEE9-7-0-7J

Problem Definition

1. Remove a set of nodes(Species) from the source
tree(UniverseKingdom.xml), which can be  around 1000,000
2. Create a File called UniverseKingdom-without-species.xml which has
every element in UniverseKingdom, except the Species nodes
3. Create batches of 1000 species and throw them out into
AnimalKingdomSpeciesBatch1.xml and so on and so forth till all the
Species are covered.

So when the Program runs, I get
1. UniverseKingdom-without-species.xml  and 1000 files , each with
1000 Species, with appropriate file names
AnimalKingdomSpeciesBatch1.xml ... to
AnimalKingdomSpeciesBatch1000.xml

What I did so far ( after many attempts and which I thought should
work  but did not work )
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1
999/XSL/Transform"
   xmlns:xs="http://www.w3.org/2001/XMLSchema";>
   <xsl:mode name="stream" streamable="yes" on-no-match="shallow-copy"/>
   <xsl:strip-space elements="*"/>
   <xsl:output method="xml" indent="yes"/>
   <xsl:template match="/">
       <xsl:result-document href="output\UniverseKingdom-w
ithout-species.xml">
           <xsl:stream href="UniverseKingdom.xml">
               <xsl:fork>
                   <xsl:sequence>
                       <xsl:apply-templates mode="stream"/>
                   </xsl:sequence>
                   <xsl:sequence>
                       <xsl:for-each
select="*:UniverseKingdom/*:AnimalKingdom">
                             <!-- Call Recursive Templates here -->
                           <xsl:call-templates
name="batch-animal-species"/>
                       </xsl:for-each>
                   </xsl:sequence>
               </xsl:fork>
           </xsl:stream>
       </xsl:result-document>
   </xsl:template>
   <xsl:template name="batch-animal-species">
       <xsl:param name="limit" select="1000000"/>
       <xsl:param name="batch" select="1"/>
       <xsl:param name="start" select="1"/>
       <xsl:param name="end" select="1000"/>
       <xsl:if test="$start &lt;= $limit ">
           <xsl:result-document
href="output\AnimalKingdomSpeciesBatch{$batch}-.xml">
               <species>
                   <xsl:for-each select="*:species[position() =
($start to $end) ]">
                       <species>
                           <xsl:copy-of select="."/>
                       </species>
                   </xsl:for-each>
               </species>
           </xsl:result-document>
           <xsl:call-template name="batch-animal-species">
               <xsl:with-param name="batch" select="$batch+1"/>
               <xsl:with-param name="start" select="$end+1"/>
               <xsl:with-param name="end" select="$end+1000"/>
           </xsl:call-template>
       </xsl:if>
   </xsl:template>
   <xsl:template match="*:species" mode="stream"/>
</xsl:stylesheet>


Here, the issue was with the template batch-animal-species . Saxon
Throws Error :

e:\perf\xslt3>java  -jar saxon9ee.jar   str.xml splitter.x
sl  -o:StreamAni.xml
Static error at xsl:template on line 22 column 91 of splitter.xsl:
 XTSE3430: Template rule is declared streamable but it does not
satisfy the streamability rules.
 * Operand . of CallTemplate#batch-animal-species selects streamed
nodes in a
context
 that allows arbitrary navigation (line 43)
Errors were reported during stylesheet compilation


I know that the logic for chunking various batched files could be made
better or even questionable.. But I was not expecting that the
Call-Template will fail.

I am hoping some ninja warriors of XSLT3 can help me with this issue//
Seriously can not take No for an answer :) a lot is dependent on this
...

Also, if someone can think of an intelligent way for me to get this
done with a smarter code, and possibly without using fork( there is a
admin sitting somewhere in the System who has asked us to create code
without the multiple threads. He wants to be responsible for the
number of threads and discourages people from spawning multiple
threads. If not possible, then I will enforce that forking has to be
done.)
Please help ...
Dak.Tap




--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--
<Prev in Thread] Current Thread [Next in Thread>