xsl-list
[Top] [All Lists]

Re: [xsl] XSL performance question: running count of attributes using axes and sum()

2009-04-09 16:54:24

Wendell,

Your and Michael's solution of preprocessing also has the benefit that I could 
round the @lengths down, making the sum easier and faster in the second pass. 
Using ken's improved axis has achieved the performance benchmark I needed.

Thanks for your help and explanations,
Mark

--- On Thu, 4/9/09, Wendell Piez <wapiez(_at_)mulberrytech(_dot_)com> wrote:


From: Wendell Piez <wapiez(_at_)mulberrytech(_dot_)com>
Subject: Re: [xsl] XSL performance question: running count of attributes using 
axes and sum()
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Date: Thursday, April 9, 2009, 12:09 PM


Hi Mark,

I think your solution is in multiple passes. Preprocess your data to make your 
values explicit for the presentation phase.

There are a number of ways you could go about it, but I'd consider something 
like this:

1. annotate words with number of syllables in each (i.e., make word/@length 
explicit)
1b. optionally, do the same with lines
2. Use a sibling-recursion approach to calculate offsets at whatever level(s) 
(syllable, word and/or line) you like
3. Then work from the offsets instead of the brute-force calculations

Sibling recursion works like this:

<xsl:template match="x" mode="add-offsets"/>
  <xsl:param name="offset" select="0"/>
  <xsl:copy>
    <xsl:copy-of select="@*"/>
    <xsl:attribute name="offset">
      <xsl:value-of select="$offset"/>
    </xsl:attribute>
  </xsl:copy>
  <xsl:apply-templates select="following-sibling::x[1]" mode="add-offsets">
    <xsl:with-param name="offset" select="$offset + @length"/>
  </xsl:apply-templates>
</xsl:template>

You would kick this off by applying templates to the x[1] (only) of any 
sequence of x siblings. As you can see, it goes forward among x element 
siblings until there aren't any left. Essentially, the technique is to force a 
forward traversal of the document, which allows passing parameters along. 
Ordinarily one doesn't want to do this since it prevents the processor from 
optimizing its traversal -- but if you need to do some kinds of intensive 
calculations in document-wide scope (as here), it can be worth it.

Preprocessing to calculate lengths of words and lines would enable you to get 
around how your syllables are not all siblings, thereby allowing calculation of 
total offsets instead of just offsets relative to their containers. Another 
possibility for dealing with this would be to use the following:: axis not the 
following-sibling:: axis, but (depending on the processor) you might not see 
the same speed gains there.

On the other hand, depending on the size of the data set, you might find that 
simply preprocessing to calculate lengths at the word and line level, and not 
doing the calculation of offsets, helps enough by itself.

If you could use XSLT 2.0, you'd have more options and techniques at your 
disposal.

Also, some processors have extensions that are useful for this sort of thing.

floor() is an XSLT 1.0 function, and a conformant processor will respect it.

Cheers,
Wendell

Getting quite comfortable using XSL. Since I am using alot more heavy-duty 
XSL, I am now hitting barriers with performance. My quesiton to the forum is 
for once, not a beginner's question!

In transforming the <syl> tags below into HTML table cells to display them, I 
need to format each cell with a green color with the running total of the 
@length attributes is a multiple of four. Ideally having the ability to do 
running totals in another variable would be great, but not the best XSL-esque 
solution, so I am using axes instead. I have tried solutions with count and 
sum, but performance is slow: 756 lines like the ones below mean thousands of 
syllables to check, each with its own axis computation -- the complete xform 
takes more than an hour!

Can anyone point me to a solution that is more performant yet still 
elegant/simple?

An aside: it seems that ceiling() is an Xpath1.0 function, but oddly enough 
not floor() -- Altova SPY complains about floor until I change the stylesheet 
to version 2.0 (sigh). I would love this to transform in XSL1.0 if possible, 
and rounding down each length to the integer is essential to acheive the 
correct formatting result.

Thanks in advance for any help on this.

XML:

<poem>
         <line id="1">
                 <word id="1">
                         <syl length="2">Ar</syl>
                         <syl length="1">ma</syl>
                 </word>
                 <word id="2">
                         <syl length="1">vi</syl>
                         <syl length="2">rum</syl>
                 </word>
                 <syl length="1">que</syl>
                 <word id="3">
                         <syl length="1">ca</syl>
                         <syl length="2">no</syl>
                 </word> ,
                 <word id="4">
                         <syl length="2">Tro</syl>
                         <syl length="2">iae</syl>
                 </word>
                 <word id="5">
                         <syl length="2">qui</syl>
                 </word>
                 <word id="6">
                         <syl length="2">pri</syl>
                         <syl length="1">mus</syl>
                 </word>
                 <word id="7">
                         <syl length="1">ab</syl>
                 </word>
                 <word id="8">
                         <syl length="2">o</syl>
                         <syl length="2">ris</syl>
                 </word>
         </line>
         <line id="2">
                 <word>
                         <syl length="2">li</syl>
                         <syl length="1.5">to</syl>
                         <syl length="1">ra</syl>
                 </word> ,
                 <word id="15">
                         <syl length="2">mul</syl>
                         <syl elide="true" length="1">tum</syl>
                 </word>
                 <word id="16">
                         <syl length="2">il</syl>
                         <syl elide="true" length="1">le</syl>
                 </word>
                 <word id="17">
                         <syl length="2">et</syl>
                 </word>
                 <word id="18">
                         <syl length="2">ter</syl>
                         <syl length="2">ris</syl>
                 </word>
                 <word id="19">
                         <syl length="2">iac</syl>
                         <syl length="2">ta</syl>
                         <syl length="1">tus</syl>
                 </word>
                 <word id="20">
                         <syl length="1">et</syl>
                 </word>
                 <word id="21">
                         <syl length="2">al</syl>
                         <syl length="2">to</syl>
                 </word>
         </line>
</poem>


XSL template:

<xsl:template match="syl">

<xsl:variable name="line_id"><xsl:value-of select="node()/ancestor::line/@id" 
/></xsl:variable>

<xsl:variable name="current_quantity"><xsl:value-of 
select="sum(preceding::syl[ancestor::line/@id = $line_id and (not(@elide) or 
@elide='false') ]/floor(@length))" /></xsl:variable>

<xsl:variable name="color"><xsl:choose><xsl:when test="@length=2 and 
($current_quantity mod 4 = 
0)">background-color:#EEFFEE;</xsl:when></xsl:choose></xsl:variable>

<td style="{$color}"><xsl:value-of select="text()" /></td>

</xsl:template>


======================================================================
Wendell Piez                            
mailto:wapiez(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--






--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>