xsl-list
[Top] [All Lists]

[xsl] Efficient dictionary lookup

2012-03-22 16:39:25
HI all,

As part of a small pilot project, I'm implementing a set of spelling normalization rules applied through XSLT 2.0 using Saxon 9. One operation that happens extremely frequently is a dictionary lookup; basically I'm checking a word form to see if it appears in a spell-checker dictionary.

The dictionary currently consists of a whitespace-separated text string (although it could be formatted any way I choose), and I've been using fn:matches() and fn:contains() to check whether or not the form appears in the dictionary:

  <xsl:function name="f:wordExists" as="xs:boolean">
    <xsl:param name="inString" as="xs:string"/>
<xsl:value-of select="contains($dictModern, concat(' ', lower-case($inString), ' '))"/>
  </xsl:function>

  <xsl:function name="f:wordExists" as="xs:boolean">
    <xsl:param name="inString" as="xs:string"/>
<xsl:value-of select="matches($dictModern, concat('\s', $inString), '\s', 'i')"/>
  </xsl:function>

Both options appear to be very costly in terms of time, and I'm wondering what the most efficient way to do this might be. Is there a faster way to do text lookups like this?

Ultimately I guess I'll implement this as an external Java process, but for the moment I'm working with XSLT, and I'd like to get some speed improvement if I can.

All help appreciated,
Martin


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>