xsl-list
[Top] [All Lists]

Re: [xsl] Efficient dictionary lookup

2012-03-22 16:44:49
On 22/03/2012 21:39, Martin Holmes wrote:
> HI all,
>
> As part of a small pilot project, I'm implementing a set of spelling
> normalization rules applied through XSLT 2.0 using Saxon 9. One
> operation that happens extremely frequently is a dictionary lookup;
> basically I'm checking a word form to see if it appears in a
> spell-checker dictionary.
>
> The dictionary currently consists of a whitespace-separated text string
> (although it could be formatted any way I choose), and I've been using
> fn:matches() and fn:contains() to check whether or not the form appears
> in the dictionary:
>
> <xsl:function name="f:wordExists" as="xs:boolean">
> <xsl:param name="inString" as="xs:string"/>
> <xsl:value-of select="contains($dictModern, concat(' ',
> lower-case($inString), ' '))"/>
> </xsl:function>
>
> <xsl:function name="f:wordExists" as="xs:boolean">
> <xsl:param name="inString" as="xs:string"/>
> <xsl:value-of select="matches($dictModern, concat('\s', $inString),
> '\s', 'i')"/>
> </xsl:function>
>
> Both options appear to be very costly in terms of time, and I'm
> wondering what the most efficient way to do this might be. Is there a
> faster way to do text lookups like this?
>
> Ultimately I guess I'll implement this as an external Java process, but
> for the moment I'm working with XSLT, and I'd like to get some speed
> improvement if I can.
>
> All help appreciated,
> Martin
>

sounds like you want to use a key then the processor will almost certainly create an efficient lookup index.

If your dictionary is in a file say dict.xml
<words>
<word>one</word>
<word>hello</word>
</words>

then

<xsl:key name="w" match="word" use="."/>

declares the index and

key('w',$word,doc('dict.xml'))

will return the word if it is in the dictionary.

David

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>