Ok, following the helpful advice on linenumbering
for displaying poems. I also want to create a
word index to these same poems.
So again given something like:
<body>
<header>
<title>A poem that should really, certainly, be included</title>
</header>
<div type="poem">
<head>headers should be included in word index</head>
<lg>
<l>This is a line that should be included</l>
<l>This is a line that should be included</l>
</lg>
<lg>
<l>This is a line that really should be included</l>
<l>This is a line that should be included</l>
</lg>
</div>
<div type="poem">
<head>headers should certainly be included in word index</head>
<lg>
<l>This is a line that really should be included</l>
<l>This is a line that <supplied>should</supplied> certainly be included</l>
</lg>
<lg>
<l>This is a line that really should be included</l>
<!-- etc -->
</lg>
</div>
</body>
What I want to output is a list counting and indexing all
the words inside <l> and <head> listing poem number and
line number, so something like:
------------
certainly (2): 2:head, 2:2.
really (3): 1:3, 2:1, 3:3.
should (9): 1:head, 1:1, 1:2, 1:3, 1:4, 2:head, 2:1, 2:2, 2:3.
-----------
(well really, I'll do an xml version, but you get the picture)
Now, I had done a word-frequency-list-of-entire-file before
by using:
----------
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"version="2.0">
<xsl:template match="/">
<xsl:for-each-group select="tokenize(lower-case(string(translate(.,',.!:;','
'))),'\s+')[string(.)]" group-by=".">
<xsl:sort />[<xsl:value-of select="."/> - <xsl:value-of
select="count(current-group())"/>]
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>
----------
But can't see how to get the word position whilst tokenizing the
whole lot? Everything I try doesn't work.
Suggestions?
-James
---
Dr James Cummings, Oxford Text Archive, University of Oxford
James dot Cummings at oucs dot ox dot ac dot uk