[xsl] Stylesheet Optimization -- How to Make It Faster

I have a stylesheet that puts mark-up to text nodes that matches anabbreviation in a reference xml file. Its working nicely but theprocessing time is very slow... i'm guessing because its processing textnodes. A 800kb file takes me about 25 mins to process and i have around800 file to process (varying file sizes, some are relatively small andsome are fairly large). Is there any way to optimize my stylesheet sothat it can process the files faster?


here is my stylesheet:

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="2.0"xmlns:xsl="http://www.w3.org/1999/XSL/Transform";xmlns:xs="http://www.w3.org/2001/XMLSchema";xmlns:ati="http://www.asiatype.com/xslt-functions";exclude-result-prefixes="xs ati">

<xsl:output method="xml" version="1.0" encoding="UTF-8"/>

<xsl:variable name="abbreviations" as="element()+"select="document('publishers_data.xml')/root/publisher/abbrev"/>

<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>

<xsl:template match="text()[ancestor::ab and not(ancestor::note[(_at_)id and@n and @lang])]">

<xsl:variable name="str" as="xs:string" select="."/>
<xsl:choose>

<xsl:whentest="exists($abbreviations[matches($str,concat('(^|\W)(',ati:escape(.),')($|\W)'))])"><xsl:variable name="search-str" as="xs:string+"select="$abbreviations[matches($str,concat('(^|\W)(',ati:escape(.),')($|\W)'))]"/>

<xsl:variable name="replace" as="element()*">
<xsl:for-each select="$search-str">
<xsl:variable name="abbr" as="xs:string" select="."/>

<abbr type="title"expand="{$abbreviations[.=$abbr]/following-sibling::expanded}"><xsl:value-ofselect="$abbr"/></abbr>

</xsl:for-each>
</xsl:variable>
<xsl:sequence select="ati:replace-with-nodes($str, $search-str, $replace)"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$str"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>

<xsl:template match="@*|element()|comment()|processing-instruction()"mode="#all">

<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:function name="ati:replace-with-nodes" as="node()+">
<xsl:param name="input" as="xs:string"/>
<xsl:param name="words-to-replace" as="xs:string*"/>
<xsl:param name="replacement" as="node()*"/>

<xsl:variable name="regex" select="string-join(for $w in$words-to-replace return concat('(', ati:escape($w), ')'),'|')"/>

<xsl:analyze-string select="$input" regex="{$regex}">
<xsl:matching-substring>

<xsl:variable name="i" as="xs:integer" select="(1 tocount($words-to-replace))[regex-group(.)]"/>

<xsl:sequence select="$replacement[$i]"/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:function>
<xsl:function name="ati:escape">
<xsl:param name="s" as="xs:string"/>

<xsl:sequenceselect="replace($s,'[\\\|\.\-\^\?\*\+\{\}\[\]\$]','\\$0')"/>

</xsl:function>
</xsl:stylesheet>

heres a short version of the publishers_data.xml:

<root>
<publisher>
<abbrev>Inschriften von Priene</abbrev>
<expanded>Inschriften von Priene</expanded>
</publisher>
<publisher>
<abbrev>P. Mil. Congr. XVIII</abbrev>
<expanded>Papiri documentari dell'UniversitàCattolica di Milano</expanded>
</publisher>
<publisher>
<abbrev>P. Jud. Des. Misc.</abbrev>
<expanded>Discoveries in the Judean Desert XXXVIII</expanded>
</publisher>
<!-- more publishers here -->
</root>

heres a snippet of the source xml:

<!-- preceding::node() of ab -->
<ab lang="grk" n="1">
<foreign lang="grk">· γέγονε κατὰ τοὺς Δαρείου</foreign>
<note place="margin">a c</note>
<lb n="5"/>

<foreign lang="grk">χρόνους τοῦ μετὰ Καμβύσην βασιλεύσαντος, ὅτε καὶΔιονύσιος ἦν ὁ Μιλήσιος</foreign><lb/>(III), <foreign lang="grk">ἐπὶ τῆς ξ¯ε¯ ὀλυμπιάδος</foreign>(520/16)<foreign lang="grk">· ἱστοριογράφος. ῾Ηρόδοτος δὲ ὁ ῾Αλι-</foreign>

<note place="margin">v</note>
<lb/>

<foreign lang="grk">καρνασεὺς ὠφέληται τούτου, νεώτερος ὤν. καὶ ἦνἀκουστὴς Πρωταγόρου</foreign>

<note id="n7" n="7" lang="ger">
<foreign lang="grk">ὤν· γέγονε γὰρ μετ᾽ αὐτόν</foreign> A</note>
<lb/>

<foreign lang="grk">ὁ ῾Εκαταῖος. πρῶτος δὲ ἱστορίαν πεζῶς ἐξήνεγκε,συγγραφὴν δὲ Φερεκύδης</foreign>

<note id="n8—9" n="8—9" lang="ger">

<foreign lang="grk">πρῶτος—νοθεύεται</foreign> wiederholt s. <foreignlang="grk">ὶστορῆσαι</foreign>, s. <foreignlang="grk">συγγραφεῖς</foreign>.</note><lb/>(I 3). <foreign lang="grk">τὰ γὰρ ᾽Ακουσιλάου</foreign> (<linktype="boj" targets="a002" n="BOJTEXT002_T_7">2 T 7</link>) <foreignlang="grk">νοθεύεται.</foreign>

<note id="n9" n="9" lang="ger">

<foreign lang="grk">᾽Ακουσιλάου</foreign> Vossius <foreignlang="grk">᾽Αγησιλάου</foreign> Suid</note>

</ab>
<!-- following::node() of ab -->

all: ab nodes appear in the same level (same depth) though out.

Any suggestions are welcome.

Thanks,
--
Jeff

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--