xsl-list
[Top] [All Lists]

Re: Linenumbering & word index

2004-08-06 09:39:34
On Fri, 6 Aug 2004, David Carlisle wrote:

You can't do 
tokenize(l/text(), '\s+')
because it wants a single string as its first argument and that's
probably more than one. 

Yup.  And that's one of the places I was getting confuddled. :-(

You can do
 select="for $l in l return tokenize($l,'\s+')"
or same with for-each and tokenize them one at a time.

ok, I think I understand that, and might work for smaller things.

however you really want to make yourself a tree first something like:

Let's see if I understand the way this works. (I do like getting 
solutions, but also want to learn ;-)   )

<xsl:template match="/">
<xsl:variable name="x">
<xsl:apply-templates mode="a" select="div[(_at_)type='poem']"/>
</xsl:variable>

Creates variable $x from the templates of mode a below for 
only the poem divs.  (See, now *that* is how to avoid the 
stuff I don't want to include.. *doh*)

[
<xsl:copy-of  select="$x"/>
]

Copy of the temporary tree listing each poem, and word in line 
for that poem.

<xsl:for-each-group select="$x/div/l/word" group-by=".">

Groups by each word in the temporary tree and sorts them
outputting  the word 
 <xsl:sort />
  <xsl:text>&#10;</xsl:text>
  <xsl:value-of select="."/>

then for each instance of a word (keys always confuse me) it 
outputs the @poem and @n line numbers.

  <xsl:for-each select="key('w',.)">
  <xsl:text> </xsl:text>
  <xsl:value-of select="../../@poem"/>:<xsl:value-of select="../@n"/>
  </xsl:for-each>
</xsl:for-each-group>
</xsl:template>


Applies the original mode a match for divs only 
to head and lg/l (modes...yes, must use modes more.)
<xsl:template mode="a" match="div">
<div poem="{position()}">
<xsl:apply-templates mode="a" select="head"/>
<xsl:apply-templates mode="a" select="lg/l"/>
</div>
</xsl:template>


When you find a head, tokenize it into a temporary 
tree of <word> elements
<xsl:template mode="a" match="head">
<l n="head">
<xsl:for-each select="tokenize(.,'(\s|[,\.!])+')">
<word><xsl:value-of select="lower-case(.)"/></word>
</xsl:for-each>
</l>
</xsl:template>


When you find a l tokenize it into a temporary tree 
of <word> elements, recording the line's position

<xsl:template mode="a" match="l">
<l n="{position()}">
<xsl:for-each select="tokenize(.,'\s+')">
<word><xsl:value-of select="."/></word>
</xsl:for-each>
</l>
</xsl:template>


For each <word> element that we've just created 
make a key of name w.
<xsl:key name="w" match="word" use="."/>

Seems to work absolutely perfectly.  (well, I'll customise 
the tokenize string...)

Many many thanks.

-James

---
Dr James Cummings, Oxford Text Archive, University of Oxford
James dot Cummings at oucs dot ox dot ac dot uk 
CALL FOR PAPERS: Digital Medievalism (Kalamazoo) and 
Early Drama (Leeds) see http://users.ox.ac.uk/~jamesc/cfp.html


<Prev in Thread] Current Thread [Next in Thread>