xsl-list
[Top] [All Lists]

Re: [xsl] Processing two documents, which order?

2011-04-08 08:14:58
On 08/04/2011 11:00, Dave Pawson wrote:
15 minutes run time is good with that sort of comparison!

Michael Kay and Tony Nassar have already suggested just tokenizing the words with a fixed regex (which may be compiled) and using a key to check which words you want to markup (which should be fast).

How long does this take on your real data?


doc1
<x>
<word>one</word>
<word>two</word>
<word>three</word>
<word>threesome</word>
<word>x-ray</word>
</x>

doc2
<body>
<p id="a">one hmmm not-one  zzzzz three</p>
<p id="b">a two one tone three</p>
<p>zzz hhh aaa iii aaa x-ray hhh</p>
</body>


dp.xsl

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>

<xsl:key name="w" match="word" use="."/>

<xsl:template match="node()">
 <xsl:copy>
  <xsl:copy-of select="@*"/>
  <xsl:apply-templates/>
 </xsl:copy>
</xsl:template>

<xsl:template match="text()" priority="2">
 <xsl:analyze-string select="." regex="[A-Za-z][a-z---]+">
  <xsl:matching-substring>
   <xsl:choose>
    <xsl:when test="key('w',.,doc('doc1.xml'))">
     <property>
      <xsl:value-of select="."/>
     </property>
    </xsl:when>
    <xsl:otherwise>
     <xsl:value-of select="."/>
    </xsl:otherwise>
   </xsl:choose>
  </xsl:matching-substring>
  <xsl:non-matching-substring>
   <xsl:value-of select="."/>
  </xsl:non-matching-substring>
 </xsl:analyze-string>
</xsl:template>

</xsl:stylesheet>


 saxon9 doc2.xml dp.xsl
<?xml version="1.0" encoding="UTF-8"?><body>
<p id="a"><property>one</property> hmmm not-one zzzzz <property>three</property></p> <p id="b">a <property>two</property> <property>one</property> tone <property>three</property></p>
<p>zzz hhh aaa iii aaa <property>x-ray</property> hhh</p>
</body>

________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. ________________________________________________________________________

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--