xsl-list
[Top] [All Lists]

Re: [xsl] I beseech thee, please give me intuition on XSLT Streaming

2013-09-07 06:43:29
There are many cases in which one could allow two downward selections. Your 
example is one. But what if it was this:

<xsl:if select="Title = 'Introduction">
  <xsl:value-of select="Author"/>
</xsl:if>

you're then faced with the prospect of reading all the information that the 
template might need before you can start the evaluation (remember, there might 
be multiple Authors and multiple Titles, and they might appear in any order). 
And if you generalize it to this:

<xsl:template match="book">
  <xsl:if select="child::Appendix">
    <xsl:apply-templates select="child::Abstract"/>
  </xsl:if>
</xsl:template>

then it becomes fairly obvious that there's no limit on the amount of buffering 
needed.

The WG made some decisions which not everyone will agree with; one of them is 
that the streamability of a stylesheet should be statically decidable (where 
streamability means the amount of memory needed is bounded). In fact Saxon in 
some cases attempts streaming optimistically in cases where the amount of 
buffering needed can't be predicted; an example is <xsl:apply-templates 
select="//section/title"> which requires buffering in some circumstances, e.g. 
if sections are nested within titles which are nested within sections. The spec 
of course allows implementations to attempt to stream anything they like, but 
it defines a subset where all implementations that claim to support streaming 
must do so -- that is, where the transformation must not fail because the data 
is too big for the memory available.

The decision about static predictability rules out approaches such as the one 
you suggest where the transformation fails if the data exceeds a certain size. 
It's that kind of failure, after all, that streaming is designed to prevent. Is 
it really better to fail because you need more than 100 bytes of buffer, rather 
than (as today) failing because you've run out of memory?

Michael Kay
Saxonica


On 7 Sep 2013, at 10:40, Costello, Roger L. wrote:

Thank you very much Michael for your explanation. That helped a lot.

May I pursue my question a bit further please?

You nicely explained *what* the one-downward-selection rule is, but I would 
like to know the *reason* for that rule. Knowing the reason for the rule will 
help me, I think.

Let's take an example. Suppose the XSLT processor is streaming in and 
processing an XML document. At some point in time I press the HALT button and 
here is what I see: The XSLT processor has just read in this start tag:

      <Book> 

It hasn't read anything beyond that. 

The one-downward-selection rule says that my code can access one child 
element of Book

      <xsl:value-of select="Title" />

but not two child elements 

      <xsl:value-of select="Title" />
      <xsl:value-of select="Author" />

What is the reason for that rule? Is it because there is a fear that by 
allowing access to two child elements the XSLT processor may be forced to 
read in too much input? But if the value of Title is huge, then 

      <xsl:value-of select="Title" />

will force the XSLT processor to read in lots of input. Right? 

If the rule is there due to a fear of forcing the XSLT processor to input too 
much, then it seems a better rule would be based on size:

      The 100 character rule: your code can access any child
      elements, provided the amount of data is less than 
      100 characters.

Thoughts?

/Roger

-----Original Message-----
From: Michael Kay [mailto:mike(_at_)saxonica(_dot_)com] 
Sent: Friday, September 06, 2013 1:17 PM
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] I beseech thee, please give me intuition on XSLT Streaming

Yes, this is a serious problem. We are acutely aware of it on the WG. We 
spent the last WG meeting trying to apply the streamability rules to some 
simple examples, and they are very hard to apply by hand. Yet at the same 
time we want to don't want to reduce the scope of what is streamable just to 
make the rules simpler.

The simplest rule of thumb is that a streamable template is only allowed to 
make one downward selection. So you can select the Title, or the Author, but 
not both. If you want both, there are ways of doing it, but you can't simply 
do what your example does and ask for the Title first and then the Author, 
because they might not appear in that order (and they might be gigabytes 
long: just because elements have nice familiar everyday names like Title and 
Author doesn't allow us to make any assumptions about what they might 
contain).

The simplest way around the "one downward selection" rule is to use your one 
downward selection to make a copy of a subtree:

<xsl:variable name="thisBook" select="copy-of(.)"/>

and within the copy you can do any navigation you like. But you can only make 
a copy of a Book if the Book fits in memory.

Some of us are hoping that the solution to this problem might lie in tools: 
tools that explain to you why your code is not streamable. 


On 6 Sep 2013, at 17:43, Costello, Roger L. wrote:

Hi Folks,

... motionless ... sweep ... free-ranging ... group-consuming ...

Those are some of the terms used in the XSLT specification for describing 
streaming.

Ouch!

I am having a hard time slogging through the explanation of XSLT streaming. 

Surely there are some simple intuitions regarding the kind of XSLT/XPath 
code that is (isn't) valid when doing streaming? 

I seek simple rules-of-thumb that will guide me in writing XSLT/XPath code 
for streaming. 

Can you provide such intuitions? Can you provide simple rules-of-thumb?

For example, I have found that it is valid to iterate over each Book and 
output its Title:

<xsl:mode streamable="yes" />

<xsl:template match="BookCatalogue">
      <Books>
          <xsl:iterate select="Book">
              <Title><xsl:value-of select="Title" /></Title>
          </xsl:iterate>
      </Books>
</xsl:template>

But it's not valid to output both Title and Author:

<xsl:mode streamable="yes" />

<xsl:template match="BookCatalogue">
      <Books>
          <xsl:iterate select="Book">
              <Title><xsl:value-of select="Title" /></Title>
              <Author><xsl:value-of select="Author" /></Author>
          </xsl:iterate>
      </Books>
</xsl:template>

Why?

I have no intuition on why the first is valid and the second is invalid.

No doubt somewhere in the XSLT specification there is some precise rule 
which explains why.

I'll never remember all the rules. 

But if I have some intuition to guide me ... ah, that will last a lifetime.

So what is the intuition behind allowing access to one child element but not 
two?

However, I am seeking more than just intuition about that specific example. 
I am seeking intuition about writing XSLT/XPath code in general for 
streaming. Can you provide an intuition so that as I write code I can think: 

     Well, I don't know the specific rule in the XSLT 
     specification, but from my general intuition 
     about streaming I know that this code is (isn't) 
     valid for streaming.

/Roger



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


<Prev in Thread] Current Thread [Next in Thread>