xsl-list
[Top] [All Lists]

RE: [xsl] Stylesheet Optimization -- How to Make It Faster

2006-11-28 02:14:33
(a) It would be a nice courtesy if you could lay out the code so that we can 
read it.

(b) What XSLT processor are you using?

(c) The most obvious inefficiency is here:
    expand="{$abbreviations[.=$abbr]/following-sibling::expanded}"
    This would benefit from use of keys.

Michael Kay
http://www.saxonica.com/
 

-----Original Message-----
From: Jeff Sese [mailto:jsese(_at_)asiatype(_dot_)com] 
Sent: 28 November 2006 01:41
To: Xsl-List
Subject: [xsl] Stylesheet Optimization -- How to Make It Faster

I have a stylesheet that puts mark-up to text nodes that 
matches an abbreviation in a reference xml file. Its working 
nicely but the processing time is very slow... i'm guessing 
because its processing text nodes. A 800kb file takes me 
about 25 mins to process and i have around 800 file to 
process (varying file sizes, some are relatively small and 
some are fairly large). Is there any way to optimize my 
stylesheet so that it can process the files faster?

here is my stylesheet:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; 
xmlns:xs="http://www.w3.org/2001/XMLSchema"; 
xmlns:ati="http://www.asiatype.com/xslt-functions"; 
exclude-result-prefixes="xs ati">
<xsl:output method="xml" version="1.0" encoding="UTF-8"/> 
<xsl:variable name="abbreviations" as="element()+" 
select="document('publishers_data.xml')/root/publisher/abbrev"/>
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="text()[ancestor::ab and 
not(ancestor::note[(_at_)id and @n and @lang])]"> <xsl:variable 
name="str" as="xs:string" select="."/> <xsl:choose> <xsl:when 
test="exists($abbreviations[matches($str,concat('(^|\W)(',ati:
escape(.),')($|\W)'))])">
<xsl:variable name="search-str" as="xs:string+" 
select="$abbreviations[matches($str,concat('(^|\W)(',ati:escap
e(.),')($|\W)'))]"/>
<xsl:variable name="replace" as="element()*"> <xsl:for-each 
select="$search-str"> <xsl:variable name="abbr" 
as="xs:string" select="."/> <abbr type="title" 
expand="{$abbreviations[.=$abbr]/following-sibling::expanded}"
<xsl:value-of
select="$abbr"/></abbr>
</xsl:for-each>
</xsl:variable>
<xsl:sequence select="ati:replace-with-nodes($str, 
$search-str, $replace)"/> </xsl:when> <xsl:otherwise> 
<xsl:value-of select="$str"/> </xsl:otherwise> </xsl:choose> 
</xsl:template> <xsl:template 
match="@*|element()|comment()|processing-instruction()" 
mode="#all">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/> </xsl:copy> 
</xsl:template> <xsl:function name="ati:replace-with-nodes" 
as="node()+"> <xsl:param name="input" as="xs:string"/> 
<xsl:param name="words-to-replace" as="xs:string*"/> 
<xsl:param name="replacement" as="node()*"/> <xsl:variable 
name="regex" select="string-join(for $w in $words-to-replace 
return concat('(', ati:escape($w), ')'),'|')"/> 
<xsl:analyze-string select="$input" regex="{$regex}"> 
<xsl:matching-substring> <xsl:variable name="i" 
as="xs:integer" select="(1 to 
count($words-to-replace))[regex-group(.)]"/>
<xsl:sequence select="$replacement[$i]"/> 
</xsl:matching-substring> <xsl:non-matching-substring> 
<xsl:value-of select="."/> </xsl:non-matching-substring> 
</xsl:analyze-string> </xsl:function> <xsl:function 
name="ati:escape"> <xsl:param name="s" as="xs:string"/> 
<xsl:sequence 
select="replace($s,'[\\\|\.\-\^\?\*\+\(\)\{\}\[\]\$]','\\$0')"/>
</xsl:function>
</xsl:stylesheet>

heres a short version of the publishers_data.xml:

<root>
<publisher>
<abbrev>Inschriften von Priene</abbrev>
<expanded>Inschriften von Priene</expanded> </publisher> 
<publisher> <abbrev>P. Mil. Congr. XVIII</abbrev> 
<expanded>Papiri documentari dell'UniversitàCattolica di 
Milano</expanded> </publisher> <publisher> <abbrev>P. Jud. 
Des. Misc.</abbrev> <expanded>Discoveries in the Judean 
Desert XXXVIII</expanded> </publisher>
<!-- more publishers here -->
</root>

heres a snippet of the source xml:

<!-- preceding::node() of ab -->
<ab lang="grk" n="1">
<foreign lang="grk">· γέγονε κατὰ τοὺς Δαρείου</foreign> 
<note place="margin">a c</note> <lb n="5"/> <foreign 
lang="grk">χρόνους τοῦ μετὰ Καμβύσην βασιλεύσαντος, ὅτε καὶ 
Διονύσιος ἦν ὁ Μιλήσιος</foreign> <lb/>(III), <foreign 
lang="grk">ἐπὶ τῆς ξ¯ε¯ ὀλυμπιάδος</foreign> (520/16)<foreign 
lang="grk">· ἱστοριογράφος. ῾Ηρόδοτος δὲ ὁ ῾Αλι-</foreign> 
<note place="margin">v</note> <lb/> <foreign 
lang="grk">καρνασεὺς ὠφέληται τούτου, νεώτερος ὤν. καὶ ἦν 
ἀκουστὴς Πρωταγόρου</foreign> <note id="n7" n="7" lang="ger"> 
<foreign lang="grk">ὤν· γέγονε γὰρ μετ᾽ αὐτόν</foreign> 
A</note> <lb/> <foreign lang="grk">ὁ ῾Εκαταῖος. πρῶτος δὲ 
ἱστορίαν πεζῶς ἐξήνεγκε, συγγραφὴν δὲ Φερεκύδης</foreign> 
<note id="n8—9" n="8—9" lang="ger"> <foreign 
lang="grk">πρῶτος—νοθεύεται</foreign> wiederholt s. <foreign 
lang="grk">ὶστορῆσαι</foreign>, s. <foreign 
lang="grk">συγγραφεῖς</foreign>.</note>
<lb/>(I 3). <foreign lang="grk">τὰ γὰρ ᾽Ακουσιλάου</foreign> 
(<link type="boj" targets="a002" n="BOJTEXT002_T_7">2 T 
7</link>) <foreign lang="grk">νοθεύεται.</foreign> <note 
id="n9" n="9" lang="ger"> <foreign 
lang="grk">᾽Ακουσιλάου</foreign> Vossius <foreign 
lang="grk">᾽Αγησιλάου</foreign> Suid</note> </ab>
<!-- following::node() of ab -->

all: ab nodes appear in the same level (same depth) though out.

Any suggestions are welcome.

Thanks,
--
Jeff

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>