xsl-list
[Top] [All Lists]

Re: [xsl] XSLT Solution for hyphenation

2006-12-27 18:56:44
Yes, i'm doing an exact match. I was thinking of using keys but i don't know how to use it for these kind of look-up, i'm more familiar with using keys in grouping.
Suppose i have this input:

<root>
   <p>I have some text that has the words abaissassent and abandonnent.</p>
</root>

How do i use keys so that i can have this output?

<root>
<p>I have some text that has the words abais&#00AD;sassent and aban&#00AD;donnent.</p>
</root>

heres a sample of the look-up table:

<root>
<wordlist>
  <entry>
    <search>abaissassent</search>
    <replace>abais&#x00AD;sassent</replace>
  </entry>
  <entry>
    <search>abaissèrent</search>
    <replace>abais&#x00AD;sèrent</replace>
  </entry>
  <entry>
    <search>abandonnent</search>
    <replace>aban&#x00AD;donnent</replace>
  </entry>
</wordlist>
</root>

-- Jeff


Michael Kay wrote:
You seem to be doing exact matching on the words in your dictionary, not
regular expression matching as your use of matches() would suggest. With
exact matching you can use a key for the lookup which will be dramatically
faster.

Michael Kay
http://www.saxonica.com/
-----Original Message-----
From: Jeff Sese [mailto:jsese(_at_)asiatype(_dot_)com] Sent: 22 December 2006 06:10
To: Xsl-List
Subject: [xsl] XSLT Solution for hyphenation

Hi list,

I have this project that applies hyphenation to an XML document using a list of words as a reference. The list of words can reach up to a million entries. My XSLT solution was having a template that matches text() nodes then insert hyphens to the matching words that are in the list. However the transformation takes to long to finish even for a relatively small file (around 1mb). Is there anyway to speed this or is there a better solution?

Here's my stylesheet:

<xsl:template match="/">
    <xsl:apply-templates/>
</xsl:template>
<xsl:template match="@*|element()|comment()|processing-instruction()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>
<xsl:template match="text()">
    <xsl:variable name="str" select="."/>
<xsl:variable name="searchStrs" as="xs:string*" select="$search-words[matches($str,.)]/replace(.,'[.\\?*+{}()\
[\]\^\$&#x007C;]',
'\\$0')"/>
<xsl:value-of select="ati:replace-all($str,$searchStrs,$replaceStr)"/>
</xsl:template>
<xsl:function name="ati:replace-all">
    <xsl:param name="input" as="xs:string"/>
    <xsl:param name="words-to-replace" as="xs:string*"/>
<xsl:sequence select="if (exists($words-to-replace)) then ati:replace-all(replace($input, $words-to-replace[1],
key('replace',$words-to-replace[1],$search-words)),remove($wor
ds-to-replace,1))
else $input"/>
</xsl:function>

heres a sample of the look-up table:

<root>
    <wordlist>
        <entry>
            <search>abaissassent</search>
            <replace>abais&#x00AD;sassent</replace>
        </entry>
        <entry>
            <search>abaissèrent</search>
            <replace>abais&#x00AD;sèrent</replace>
        </entry>
        <entry>
            <search>abandonnent</search>
            <replace>aban&#x00AD;donnent</replace>
        </entry>
    </wordlist>
</root>

so if i have a "abaissassent" in a text() node this will be replaced with "aban&#x00AD;donnent".

--
*Jeff*

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>