On Fri, 6 Aug 2004, David Carlisle wrote:
You can't do
tokenize(l/text(), '\s+')
because it wants a single string as its first argument and that's
probably more than one.
Yup. And that's one of the places I was getting confuddled. :-(
You can do
select="for $l in l return tokenize($l,'\s+')"
or same with for-each and tokenize them one at a time.
ok, I think I understand that, and might work for smaller things.
however you really want to make yourself a tree first something like:
Let's see if I understand the way this works. (I do like getting
solutions, but also want to learn ;-) )
<xsl:template match="/">
<xsl:variable name="x">
<xsl:apply-templates mode="a" select="div[(_at_)type='poem']"/>
</xsl:variable>
Creates variable $x from the templates of mode a below for
only the poem divs. (See, now *that* is how to avoid the
stuff I don't want to include.. *doh*)
[
<xsl:copy-of select="$x"/>
]
Copy of the temporary tree listing each poem, and word in line
for that poem.
<xsl:for-each-group select="$x/div/l/word" group-by=".">
Groups by each word in the temporary tree and sorts them
outputting the word
<xsl:sort />
<xsl:text> </xsl:text>
<xsl:value-of select="."/>
then for each instance of a word (keys always confuse me) it
outputs the @poem and @n line numbers.
<xsl:for-each select="key('w',.)">
<xsl:text> </xsl:text>
<xsl:value-of select="../../@poem"/>:<xsl:value-of select="../@n"/>
</xsl:for-each>
</xsl:for-each-group>
</xsl:template>
Applies the original mode a match for divs only
to head and lg/l (modes...yes, must use modes more.)
<xsl:template mode="a" match="div">
<div poem="{position()}">
<xsl:apply-templates mode="a" select="head"/>
<xsl:apply-templates mode="a" select="lg/l"/>
</div>
</xsl:template>
When you find a head, tokenize it into a temporary
tree of <word> elements
<xsl:template mode="a" match="head">
<l n="head">
<xsl:for-each select="tokenize(.,'(\s|[,\.!])+')">
<word><xsl:value-of select="lower-case(.)"/></word>
</xsl:for-each>
</l>
</xsl:template>
When you find a l tokenize it into a temporary tree
of <word> elements, recording the line's position
<xsl:template mode="a" match="l">
<l n="{position()}">
<xsl:for-each select="tokenize(.,'\s+')">
<word><xsl:value-of select="."/></word>
</xsl:for-each>
</l>
</xsl:template>
For each <word> element that we've just created
make a key of name w.
<xsl:key name="w" match="word" use="."/>
Seems to work absolutely perfectly. (well, I'll customise
the tokenize string...)
Many many thanks.
-James
---
Dr James Cummings, Oxford Text Archive, University of Oxford
James dot Cummings at oucs dot ox dot ac dot uk
CALL FOR PAPERS: Digital Medievalism (Kalamazoo) and
Early Drama (Leeds) see http://users.ox.ac.uk/~jamesc/cfp.html