xsl-list
[Top] [All Lists]

Re: [xsl] Processing two documents, which order?

2011-04-07 09:26:12
On 07/04/2011 14:25, Dave Pawson wrote:

I have two xml documents.
The first is a list of marked up words (1),
the second a 'normal' xml document (2)

For each occurrence in 2 of a word from 1
I need to mark up the word with<property>  </property>

Which order is anywhere near optimum?
Document 1 has about 300 words,
Document 2 is 33,000 lines.
I'm having trouble seeing how this description of the problem relates to the code given below.

From first principles, if you do a nested loop then you're doing either 300*33000 operations or 33000*300 - its not a big difference either way. On the other hand if you use keys, then you are basically doing 300+33000 operations either way - but the key will be smaller if you build it on the smaller document, so that's what I would do.

Using regex matching with a dynamically computed regex looks like bad news - or is it really a regex in the source document? Saxon precompiles the regex if it's known statically, but if not there's no caching or anything - it gets compiled on each use. From this viewpoint, using each regex once (in a single analyze-string call) is going to be better.

Michael Kay
Saxonica
This is the template to do the work

<xsl:template match="*">
     <xsl:param name="property" as="xs:string"/>
     <xsl:analyze-string select="." regex="({$property})[\s\p{{P}}]">
       <xsl:matching-substring>
<!-- <xsl:message>match on [<xsl:value-of
select='regex-group(1)'/>]</xsl:message>  -->
<property><xsl:value-of
select="regex-group(1)"/></property>  </xsl:matching-substring>
       <xsl:non-matching-substring>
        <xsl:copy-of select="."/>
       </xsl:non-matching-substring>
     </xsl:analyze-string>
   </xsl:template>

but I'm hesitating as to which loop sequence will work best?




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--