On 07/04/2011 14:25, Dave Pawson wrote:
I have two xml documents.
The first is a list of marked up words (1),
the second a 'normal' xml document (2)
For each occurrence in 2 of a word from 1
I need to mark up the word with<property> </property>
Which order is anywhere near optimum?
Document 1 has about 300 words,
Document 2 is 33,000 lines.
I'm having trouble seeing how this description of the problem relates to
the code given below.
From first principles, if you do a nested loop then you're doing either
300*33000 operations or 33000*300 - its not a big difference either way.
On the other hand if you use keys, then you are basically doing
300+33000 operations either way - but the key will be smaller if you
build it on the smaller document, so that's what I would do.
Using regex matching with a dynamically computed regex looks like bad
news - or is it really a regex in the source document? Saxon precompiles
the regex if it's known statically, but if not there's no caching or
anything - it gets compiled on each use. From this viewpoint, using each
regex once (in a single analyze-string call) is going to be better.
Michael Kay
Saxonica
This is the template to do the work
<xsl:template match="*">
<xsl:param name="property" as="xs:string"/>
<xsl:analyze-string select="." regex="({$property})[\s\p{{P}}]">
<xsl:matching-substring>
<!-- <xsl:message>match on [<xsl:value-of
select='regex-group(1)'/>]</xsl:message> -->
<property><xsl:value-of
select="regex-group(1)"/></property> </xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:copy-of select="."/>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
but I'm hesitating as to which loop sequence will work best?
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--