On 08/04/2011 11:00, Dave Pawson wrote:
15 minutes run time is good with that sort of comparison!
Michael Kay and Tony Nassar have already suggested just tokenizing the
words with a fixed regex (which may be compiled) and using a key to
check which words you want to markup (which should be fast).
How long does this take on your real data?
doc1
<x>
<word>one</word>
<word>two</word>
<word>three</word>
<word>threesome</word>
<word>x-ray</word>
</x>
doc2
<body>
<p id="a">one hmmm not-one zzzzz three</p>
<p id="b">a two one tone three</p>
<p>zzz hhh aaa iii aaa x-ray hhh</p>
</body>
dp.xsl
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:key name="w" match="word" use="."/>
<xsl:template match="node()">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="text()" priority="2">
<xsl:analyze-string select="." regex="[A-Za-z][a-z---]+">
<xsl:matching-substring>
<xsl:choose>
<xsl:when test="key('w',.,doc('doc1.xml'))">
<property>
<xsl:value-of select="."/>
</property>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:value-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
</xsl:stylesheet>
saxon9 doc2.xml dp.xsl
<?xml version="1.0" encoding="UTF-8"?><body>
<p id="a"><property>one</property> hmmm not-one zzzzz
<property>three</property></p>
<p id="b">a <property>two</property> <property>one</property> tone
<property>three</property></p>
<p>zzz hhh aaa iii aaa <property>x-ray</property> hhh</p>
</body>
________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.
This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs.
________________________________________________________________________
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--