xsl-list
[Top] [All Lists]

[xsl] Processing milestoned XML leads to many preceding:: calls and horrible performance

2012-02-21 03:04:41
Hi,

I am again working on a XSLT stylesheet to convert a Czech Bible translation from home-brew schema to OSIS and I got to some performance problems.

Whole stylesheet is https://gitorious.org/sword/czekms-csp_bible/blobs/master/CEP2OSIS.xsl (and git repo can be clone from ...), but I believe the relevant parts are

    <xsl:template name="genRef">
        <xsl:variable name="refKniha" select="//kniha[1]/@jmeno"/>
        <xsl:variable name="refKapitola" select="preceding::kap[1]/@n"/>
        <xsl:value-of select="concat($refKniha,'.',$refKapitola,'.')"/>
    </xsl:template>

    <xsl:template name="endVerse">
        <xsl:param name="rBase" />
        <xsl:element name="verse">
            <xsl:variable name="prevVerseID">
                <xsl:value-of select="./preceding::vers[1]/@n" />
            </xsl:variable>
            <xsl:attribute name="eID">
                <xsl:value-of select="concat($rBase,$prevVerseID)" />
            </xsl:attribute>
        </xsl:element>
    </xsl:template>

    <!-- ... -->

    <xsl:template match="vers">
        <xsl:variable name="refBase">
            <xsl:call-template name="genRef" />
        </xsl:variable>
        <xsl:variable name="refID" select="concat($refBase,./@n)" />
<!-- Find out whether this is a first verse in a chapter; notice that <kap/> element is milestoned as well, so we have to count a distance in <verse/> elements from it, rather than use plain count() -->
        <xsl:variable name="curPos"

select="count(./preceding::kap[1]/following::*[not(count(preceding-sibling::vers|current()) = count(preceding-sibling::vers))])" />
        <xsl:if test="not($curPos=1)">
            <xsl:call-template name="endVerse">
                <xsl:with-param name="rBase">
                    <xsl:value-of select="$refBase" />
                </xsl:with-param>
            </xsl:call-template>
        </xsl:if>
        <xsl:element name="verse">
            <xsl:attribute name="sID">
                    <xsl:value-of select="$refID" />
                </xsl:attribute>
            <xsl:attribute name="osisID">
                    <xsl:value-of select="$refID" />
                </xsl:attribute>
        </xsl:element>
    </xsl:template>

This works (at least as much as I was able to test it give then the circumstances), but the performance is absolutely dreadful. Just book of Genesis took almost an hour before being processed (with one core of my dual-core CPU being constantly at 100%).

Obviously the problem is that <xsl:variable name="curPos"/>, and I read about how preceding* axes are horribly inefficient all over the Internet, but unfortunately I haven't figured out any other way how to do what I am doing and most laments about preceding* axes don't provide much hints either.

The problem is (I think) in both <vers/> (that's "verse" in Czech) and <kap/> (that's an abbreviation of "chapter") are just milestones, so I have to go through all verses in whole book all the time (yes, this is http://www.joelonsoftware.com/articles/fog0000000319.html all over again).

Any ideas? Would some other XSLT processors other than xsltproc (libxml 20706, libxslt 10126 and libexslt 815) I am using be able to optimize this somehow?

Thanks a lot,

Matěj

--
http://www.ceplovi.cz/matej/, Jabber: mcepl<at>ceplovi.cz
GPG Finger: 89EF 4BC6 288A BF43 1BAB  25C3 E09F EF25 D964 84AC

в чужой монастырь со своим уставом не ходят.
    -- Russian proverb (this time actually checked by a native
       Russian)


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--