xsl-list
[Top] [All Lists]

Re: [xsl] Processing milestoned XML leads to many preceding:: calls and horrible performance

2012-02-21 03:18:58
A sample of the input XML would help, and don't assume everybody knows
what "to milestone" means - it isn't even a verb in the English
language.

-W


On 21/02/2012, Matěj Cepl <mcepl(_at_)redhat(_dot_)com> wrote:
Hi,

I am again working on a XSLT stylesheet to convert a Czech Bible
translation from home-brew schema to OSIS and I got to some performance
problems.

Whole stylesheet is
https://gitorious.org/sword/czekms-csp_bible/blobs/master/CEP2OSIS.xsl
(and git repo can be clone from ...), but I believe the relevant parts are

     <xsl:template name="genRef">
         <xsl:variable name="refKniha" select="//kniha[1]/@jmeno"/>
         <xsl:variable name="refKapitola" select="preceding::kap[1]/@n"/>
         <xsl:value-of select="concat($refKniha,'.',$refKapitola,'.')"/>
     </xsl:template>

     <xsl:template name="endVerse">
         <xsl:param name="rBase" />
         <xsl:element name="verse">
             <xsl:variable name="prevVerseID">
                 <xsl:value-of select="./preceding::vers[1]/@n" />
             </xsl:variable>
             <xsl:attribute name="eID">
                 <xsl:value-of select="concat($rBase,$prevVerseID)" />
             </xsl:attribute>
         </xsl:element>
     </xsl:template>

     <!-- ... -->

     <xsl:template match="vers">
         <xsl:variable name="refBase">
             <xsl:call-template name="genRef" />
         </xsl:variable>
         <xsl:variable name="refID" select="concat($refBase,./@n)" />
         <!-- Find out whether this is a first verse in a chapter;
notice that <kap/> element is milestoned as well,
         so we have to count a distance in <verse/> elements from it,
rather than use plain count() -->
         <xsl:variable name="curPos"

select="count(./preceding::kap[1]/following::*[not(count(preceding-sibling::vers|current())
= count(preceding-sibling::vers))])" />
         <xsl:if test="not($curPos=1)">
             <xsl:call-template name="endVerse">
                 <xsl:with-param name="rBase">
                     <xsl:value-of select="$refBase" />
                 </xsl:with-param>
             </xsl:call-template>
         </xsl:if>
         <xsl:element name="verse">
             <xsl:attribute name="sID">
                     <xsl:value-of select="$refID" />
                 </xsl:attribute>
             <xsl:attribute name="osisID">
                     <xsl:value-of select="$refID" />
                 </xsl:attribute>
         </xsl:element>
     </xsl:template>

This works (at least as much as I was able to test it give then the
circumstances), but the performance is absolutely dreadful. Just book of
Genesis took almost an hour before being processed (with one core of my
dual-core CPU being constantly at 100%).

Obviously the problem is that <xsl:variable name="curPos"/>, and I read
about how preceding* axes are horribly inefficient all over the
Internet, but unfortunately I haven't figured out any other way how to
do what I am doing and most laments about preceding* axes don't provide
much hints either.

The problem is (I think) in both <vers/> (that's "verse" in Czech) and
<kap/> (that's an abbreviation of "chapter") are just milestones, so I
have to go through all verses in whole book all the time (yes, this is
http://www.joelonsoftware.com/articles/fog0000000319.html all over again).

Any ideas? Would some other XSLT processors other than xsltproc (libxml
20706, libxslt 10126 and libexslt 815) I am using be able to optimize
this somehow?

Thanks a lot,

Matěj

--
http://www.ceplovi.cz/matej/, Jabber: mcepl<at>ceplovi.cz
GPG Finger: 89EF 4BC6 288A BF43 1BAB  25C3 E09F EF25 D964 84AC

в чужой монастырь со своим уставом не ходят.
     -- Russian proverb (this time actually checked by a native
        Russian)


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--