xsl-list
[Top] [All Lists]

Re: Concordance with XSLT

2005-11-06 00:23:12
Here's a quick result.

The following is an XSLT2.0 transformation, which produces the
concordance for a given word. The example shows the results for all 56
occurences of the word "loved" in the Old Testament.

On my 3GHz PC this took 250 milliseconds.

By using the function  f:wordConcord() it is straightforword to
produce a complete concordance, first finding all unique words in the
text and then invoking

       f:wordConcord()

for every word in this set.

Certainly, there is a much faster algorithm (, which hopefully doesn't
require too much memory), in which the complete concordance is
produced by reading the text just once (not reading the document once
for every unique word) -- I'll play with this when there's again some
free time.

Below is the xslt code:


<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
 xmlns:xs="http://www.w3.org/2001/XMLSchema";
 xmlns:f="http://fxsl.sf.net/";
 exclude-result-prefixes="f xs"

 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
  <concord>
    <xsl:sequence select="f:wordConcord(/,'loved')"/>
  </concord>

 </xsl:template>

 <xsl:function name="f:wordConcord" as="element()*">
   <xsl:param name="pDoc" as="document-node()"/>
   <xsl:param name="pWord" as="xs:string"/>

   <xsl:for-each select=
    "$pDoc/tstmt/bookcoll/book/chapter/(.|div)
                               /v[contains(.,$pWord)]">
      <xsl:variable name="vverseWords"
       select="tokenize(lower-case(string(.)), '[\s.?!,;—:\-]+')[.]"/>

      <xsl:if test="$pWord = $vverseWords">
       <xsl:variable name="vVerse" select="."/>
       <xsl:for-each select="$vverseWords[. = $pWord]">
                     <occurs w="{$pWord}"
                       
book="{substring($vVerse/ancestor::book[1]/bktshort,1,3)}"
                       chapter="{count($vVerse/ancestor::chapter[1]
                                             /preceding-sibling::chapter)+1}"
                       verse="{count($vVerse/preceding-sibling::v)+1}"
                     >
                      <xsl:sequence select=
                      "f:displayContext(string($vVerse), $pWord, position(), 
15)"
                      />
                     </occurs>
             </xsl:for-each>
           </xsl:if>
   </xsl:for-each>
 </xsl:function>

 <xsl:function name="f:displayContext" as="xs:string">
  <xsl:param name="pText" as="xs:string"/>
  <xsl:param name="pWord" as="xs:string"/>
  <xsl:param name="pwordNum" as="xs:integer"/>
  <xsl:param name="pRadius" />

   <xsl:variable name="vwOffset" select=
   "f:nthWord($pText, $pWord, $pwordNum, 0)"
   />

   <xsl:variable name="vWLen" select="string-length($pWord)"/>

   <xsl:variable name="vText2" select=
   "concat(substring($pText,1,$vwOffset ),
           substring($pWord,1,1),
           '.',
           substring($pText,$vwOffset+$vWLen+1)
           )"
   />
   <xsl:variable name="vStart" select=
   "if($vwOffset > $pRadius)
     then $vwOffset - $pRadius
     else 1"
   />
   <xsl:sequence select=
   "substring($vText2, $vStart, 2*$pRadius+$vWLen)"
   />
 </xsl:function>

 <xsl:function name="f:nthWord" as="xs:integer">
  <xsl:param name="pText" as="xs:string"/>
  <xsl:param name="pWord" as="xs:string"/>
  <xsl:param name="pwordNum" as="xs:integer"/>
  <xsl:param name="pOffset" as="xs:integer"/>

  <xsl:variable name="vZ" select="'ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ'"/>
  <xsl:variable name="vWLen" select="string-length($pWord)"/>
  <xsl:variable name="vTLen" select="string-length($pText)"/>

  <xsl:variable name="vZWord" select=
                "substring($vZ,1,$vWLen)"/>

  <xsl:sequence select=
   "if($pwordNum lt 1)
      then $pOffset - $vWLen +1
      else
        for $txt in replace($pText, concat('[^\w]',$pWord,
                            '[^\w]'),
                            concat(' ',$vZWord, ' ')
                            ),
        $off in string-length(substring-before($txt,$vZWord))
          return
            f:nthWord(substring($pText, $off + $vWLen),
                      $pWord,
                      $pwordNum - 1,
                      $pOffset
                      +$vTLen -string-length(substring-after($txt,$vZWord)) -1
                      )
    "
   />
 </xsl:function>
</xsl:stylesheet>

When this moderate length (105 lines) transformation is applied on the
xml-ized version of the Old Testament ot.xml (ommitted for brevity),

the following result is produced:

<concord>
   <occurs w="loved" book="Gen" chapter="24" verse="67">is wife; and
he l. her: and Isaac w</occurs>
   <occurs w="loved" book="Gen" chapter="25" verse="28">And Isaac l.
Esau, because he did e</occurs>
   <occurs w="loved" book="Gen" chapter="25" verse="28">on: but
Rebekah l. Jacob.
 </occurs>
   <occurs w="loved" book="Gen" chapter="27" verse="14">h as his father l..
 </occurs>
   <occurs w="loved" book="Gen" chapter="29" verse="18">And Jacob l.
Rachel; and said, I wi</occurs>
   <occurs w="loved" book="Gen" chapter="29" verse="30"> Rachel, and
he l. also Rachel more</occurs>
   <occurs w="loved" book="Gen" chapter="34" verse="3">f Jacob, and he
l. the damsel, and </occurs>
   <occurs w="loved" book="Gen" chapter="37" verse="3">Now Israel l.
Joseph more than all </occurs>
   <occurs w="loved" book="Gen" chapter="37" verse="4">at their father
l. him more than al</occurs>
   <occurs w="loved" book="Deu" chapter="4" verse="37">And because he
l. thy fathers, ther</occurs>
   <occurs w="loved" book="Deu" chapter="7" verse="8">ecause the LORD
l. you, and because</occurs>
   <occurs w="loved" book="Deu" chapter="23" verse="5">he LORD thy God l. thee.
 </occurs>
   <occurs w="loved" book="Deu" chapter="33" verse="3">Yea, he l. the
people; all his sain</occurs>
   <occurs w="loved" book="Jud" chapter="16" verse="4">erward, that he
l. a woman in the v</occurs>
   <occurs w="loved" book="1 S" chapter="1" verse="5">portion; for he
l. Hannah: but the </occurs>
   <occurs w="loved" book="1 S" chapter="16" verse="21">ore him: and
he l. him greatly; and</occurs>
   <occurs w="loved" book="1 S" chapter="18" verse="1">d, and Jonathan
l. him as his own s</occurs>
   <occurs w="loved" book="1 S" chapter="18" verse="3">ant, because he
l. him as his own s</occurs>
   <occurs w="loved" book="1 S" chapter="18" verse="16">srael and
Judah l. David, because h</occurs>
   <occurs w="loved" book="1 S" chapter="18" verse="20">Saul's
daughter l. David: and they </occurs>
   <occurs w="loved" book="1 S" chapter="18" verse="28">Saul's daughter l. him.
 </occurs>
   <occurs w="loved" book="1 S" chapter="20" verse="17">ain, because
he l. him: for he love</occurs>
   <occurs w="loved" book="1 S" chapter="20" verse="17">ved him: for
he l. him as he loved </occurs>
   <occurs w="loved" book="1 S" chapter="20" verse="17">loved him as
he l. his own soul.
 </occurs>
   <occurs w="loved" book="2 S" chapter="12" verse="24">n: and the LORD l. him.
 </occurs>
   <occurs w="loved" book="2 S" chapter="13" verse="1">he son of David l. her.
 </occurs>
   <occurs w="loved" book="2 S" chapter="13" verse="15">herewith he
had l. her. And Amnon s</occurs>
   <occurs w="loved" book="1 K" chapter="3" verse="3">And Solomon l.
the LORD, walking in</occurs>
   <occurs w="loved" book="1 K" chapter="10" verse="9">ecause the LORD
l. Israel for ever,</occurs>
   <occurs w="loved" book="1 K" chapter="11" verse="1">ut king Solomon
l. many strange wom</occurs>
   <occurs w="loved" book="2 C" chapter="2" verse="11">e the LORD hath
l. his people, he h</occurs>
   <occurs w="loved" book="2 C" chapter="9" verse="8">because thy God
l. Israel, to estab</occurs>
   <occurs w="loved" book="2 C" chapter="11" verse="21">And Rehoboam
l. Maachah the daughte</occurs>
   <occurs w="loved" book="2 C" chapter="26" verse="10"> Carmel: for
he l. husbandry.
 </occurs>
   <occurs w="loved" book="Est" chapter="2" verse="17">And the king l.
Esther above all th</occurs>
   <occurs w="loved" book="Job" chapter="19" verse="19">and they whom
I l. are turned again</occurs>
   <occurs w="loved" book="Psa" chapter="26" verse="8">LORD, I have l.
the habitation of t</occurs>
   <occurs w="loved" book="Psa" chapter="47" verse="4">f Jacob whom he
l.. Selah.
 </occurs>
   <occurs w="loved" book="Psa" chapter="78" verse="68">t Zion which he l..
 </occurs>
   <occurs w="loved" book="Psa" chapter="109" verse="17">As he l.
cursing, so let it come un</occurs>
   <occurs w="loved" book="Psa" chapter="119" verse="7">s, which I have l..
 </occurs>
   <occurs w="loved" book="Psa" chapter="119" verse="8">s, which I
have l.; and I will medi</occurs>
   <occurs w="loved" book="Isa" chapter="43" verse="4">ble, and I have
l. thee: therefore </occurs>
   <occurs w="loved" book="Isa" chapter="48" verse="14">? The LORD
hath l. him: he will do </occurs>
   <occurs w="loved" book="Jer" chapter="2" verse="25"> no; for I have
l. strangers, and a</occurs>
   <occurs w="loved" book="Jer" chapter="8" verse="2"> whom they have
l., and whom they h</occurs>
   <occurs w="loved" book="Jer" chapter="14" verse="10"> Thus have
they l. to wander, they </occurs>
   <occurs w="loved" book="Jer" chapter="31" verse="3">ng, Yea, I have
l. thee with an eve</occurs>
   <occurs w="loved" book="Eze" chapter="16" verse="37"> that thou
hast l., with all them t</occurs>
   <occurs w="loved" book="Hos" chapter="9" verse="1"> God, thou hast
l. a reward upon ev</occurs>
   <occurs w="loved" book="Hos" chapter="9" verse="10">cording as they l..
 </occurs>
   <occurs w="loved" book="Hos" chapter="11" verse="1">a child, then I
l. him, and called </occurs>
   <occurs w="loved" book="Mal" chapter="1" verse="2">I have l. you,
saith the LORD. Yet </occurs>
   <occurs w="loved" book="Mal" chapter="1" verse="2">erein hast thou
l. us? Was not Esau</occurs>
   <occurs w="loved" book="Mal" chapter="1" verse="2">the LORD: yet I l. Jacob,
 </occurs>
   <occurs w="loved" book="Mal" chapter="2" verse="11">e LORD which he
l., and hath marrie</occurs>
</concord>


Hope this helped.

--
Cheers,
Dimitre Novatchev
---------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all.


On 11/6/05, Dimitre Novatchev <dnovatchev(_at_)gmail(_dot_)com> wrote:
Just curious, do you think XSL is the best tool for this job or
something that can be used to do this job?

Can't say in advance.

As we know, XSLT 2.0 has better string processing capabilities (such
as regular expressions) and is easier and more appropriate to use for
string processing than XSLT1.0.

My personal preference would be to use FXSL 2.0, having used it
successfully for other string processing tasks such as spelling
checking and text justification.

Also, Saxon 8.6 just came out with a huge improvement in appending to
a sequence -- it is logical to expect a very similar improvement for
string concatenation in the future...

To summarise: I wouldn't be surprised if XSLT 2.0 + FXSL 2.0 handle
this task better than expected.

--
Cheers,
Dimitre Novatchev
---------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all.

<Prev in Thread] Current Thread [Next in Thread>