sorry for the messy sample files... my mail client removed the tabs.
I'm using saxon 8.8j
i already used keys upon your suggestion, however i did not notice a
change in the processing time, but i'll test more files just to be sure.
here's now my new xsl
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:ati="http://www.asiatype.com/xslt-functions"
exclude-result-prefixes="xs ati">
<xsl:output method="xml" version="1.0" encoding="UTF-8"/>
<xsl:variable name="abbreviations" as="element()+"
select="document('publishers_data.xml')/root/publisher/abbrev"/>
<xsl:key name="abbrev" match="expanded" use="preceding-sibling::abbrev"/>
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="text()[ancestor::ab and not(ancestor::note[(_at_)id
and @n and
@lang])][exists($abbreviations[matches(current(),concat('(^|\W)(',ati:escape(.),')($|\W)'))])]">
<xsl:variable name="str" as="xs:string" select="."/>
<xsl:variable name="search-str" as="xs:string+"
select="$abbreviations[matches($str,concat('(^|\W)(',ati:escape(.),')($|\W)'))]"/>
<xsl:variable name="replace" as="element()*">
<xsl:for-each select="$search-str">
<xsl:variable name="abbr" as="xs:string" select="."/>
<abbr type="title" expand="{$abbreviations/key('abbrev',
$abbr)}">
<xsl:value-of select="$abbr"/>
</abbr>
</xsl:for-each>
</xsl:variable>
<xsl:sequence select="ati:replace-with-nodes($str, $search-str,
$replace)"/>
</xsl:template>
<xsl:template
match="@*|element()|comment()|processing-instruction()" mode="#all">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:function name="ati:replace-with-nodes" as="node()+">
<xsl:param name="input" as="xs:string"/>
<xsl:param name="words-to-replace" as="xs:string*"/>
<xsl:param name="replacement" as="node()*"/>
<xsl:variable name="regex" select="string-join(for $w in
$words-to-replace return concat('(', ati:escape($w), ')'),'|')"/>
<xsl:analyze-string select="$input" regex="{$regex}">
<xsl:matching-substring> <xsl:variable name="i"
as="xs:integer" select="(1 to count($words-to-replace))[regex-group(.)]"/>
<xsl:sequence select="$replacement[$i]"/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:function>
<xsl:function name="ati:escape">
<xsl:param name="s" as="xs:string"/>
<xsl:sequence
select="replace($s,'[\\\|\.\-\^\?\*\+\(\)\{\}\[\]\$]','\\$0')"/>
</xsl:function>
</xsl:stylesheet>
heres a short version of the publishers_data.xml:
<root>
<publisher>
<abbrev>Inschriften von Priene</abbrev>
<expanded>Inschriften von Priene</expanded> </publisher>
<publisher> <abbrev>P. Mil. Congr. XVIII</abbrev>
<expanded>Papiri documentari dell'UniversitàCattolica di
Milano</expanded> </publisher> <publisher> <abbrev>P. Jud.
Des. Misc.</abbrev> <expanded>Discoveries in the Judean
Desert XXXVIII</expanded> </publisher>
<!-- more publishers here -->
</root>
heres a snippet of the source xml:
<!-- preceding::node() of ab -->
<ab lang="grk" n="1">
<foreign lang="grk">· γέγονε κατὰ τοὺς Δαρείου</foreign>
<note place="margin">a c</note> <lb n="5"/> <foreign
lang="grk">χρόνους τοῦ μετὰ Καμβύσην βασιλεύσαντος, ὅτε καὶ
Διονύσιος ἦν ὁ Μιλήσιος</foreign> <lb/>(III), <foreign
lang="grk">ἐπὶ τῆς ξ¯ε¯ ὀλυμπιάδος</foreign> (520/16)<foreign
lang="grk">· ἱστοριογράφος. ῾Ηρόδοτος δὲ ὁ ῾Αλι-</foreign>
<note place="margin">v</note> <lb/> <foreign
lang="grk">καρνασεὺς ὠφέληται τούτου, νεώτερος ὤν. καὶ ἦν
ἀκουστὴς Πρωταγόρου</foreign> <note id="n7" n="7" lang="ger">
<foreign lang="grk">ὤν· γέγονε γὰρ μετ᾽ αὐτόν</foreign>
A</note> <lb/> <foreign lang="grk">ὁ ῾Εκαταῖος. πρῶτος δὲ
ἱστορίαν πεζῶς ἐξήνεγκε, συγγραφὴν δὲ Φερεκύδης</foreign>
<note id="n8—9" n="8—9" lang="ger"> <foreign
lang="grk">πρῶτος—νοθεύεται</foreign> wiederholt s. <foreign
lang="grk">ὶστορῆσαι</foreign>, s. <foreign
lang="grk">συγγραφεῖς</foreign>.</note>
<lb/>(I 3). <foreign lang="grk">τὰ γὰρ ᾽Ακουσιλάου</foreign>
(<link type="boj" targets="a002" n="BOJTEXT002_T_7">2 T
7</link>) <foreign lang="grk">νοθεύεται.</foreign> <note
id="n9" n="9" lang="ger"> <foreign
lang="grk">᾽Ακουσιλάου</foreign> Vossius <foreign
lang="grk">᾽Αγησιλάου</foreign> Suid</note> </ab>
<!-- following::node() of ab -->
all: ab nodes appear in the same level (same depth) though out.
Any suggestions are welcome.
Thanks,
--
Jeff
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--