-----Original Message-----
From: James Cummings
[mailto:James(_dot_)Cummings(_at_)ota(_dot_)ahds(_dot_)ac(_dot_)uk]
Sent: 06 August 2004 14:41
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] Linenumbering & word index
On Fri, 6 Aug 2004, David Carlisle wrote:
I lost or forgot the start of this thread so I'll ignore your main
questions but I can answer one of the questions in comments
Right, I'll start from the beginning again then.
In a document with a lot of poems laid out as:
<div type="poem">
<head>headers should be included in word index</head>
<lg>
<l>This is a line that really should be included</l>
<l>This is a line that should be included</l>
</lg>
<p>This shouldn't be included</p>
<lg>
<l>This is a line that really should be included</l>
<l>This is a line that should be included</l>
</lg>
</div>
What I want to produce is a word-index of
poem number and line number, something like:
a (4) -- 1:1, 1:2, 1:3, 1:4, 2:3, 2:5 (well, no poem 2 here ;-) )
be (5) -- 1:head, 1:1, 1:2, 1:3, 1:4
...
really (2) -- 1:1, 1:3, 2:1, 2:3 (if it was in poem 2 as well)
What I was trying to suggest was that you go in two phases:
(a) build a list containing (word, poem number, line number)
(b) group that list by word
and that the output of (a) should be a temporary tree. Sorry if the
reference to position() confused you - I was concentrating on the top-level
design, not the detail.
For example phase 1 might actually be
<xsl:variable name="wordlist">
<xsl:for-each select="//text()">
<xsl:for-each select="tokenize(., xxx)">
<word w=".">
<poem><xsl:number count="poem"/></poem>
<line><xsl:number count="l"/></line>
</word>
</
</
</
Michael Kay
I had previously done word frequency lists as:
-------
<xsl:template match="/">
<xsl:for-each-group
select="tokenize(lower-case(string(translate(.,',.!:;','
'))),'\s+')[string(.)]" group-by=".">
<xsl:sort />[<xsl:value-of select="."/> - <xsl:value-of
select="count(current-group())"/>]
</xsl:for-each-group>
</xsl:template>
------
And Mike suggested I first build a temporary tree something like:
<xsl:variable name="words">
<xsl:for-each select="tokenize(., '\s+')">
<word value="{.}" position="{position()}"/>
</xsl:for-each>
But I don't see how I a) tokenize only the output of l/text() and
head/text() (it complains of multiple inputs when I do so), and
b) how I get line-number and poem-number based on position()?
--------------
My completely messed up xsl so far is:
<xsl:template match="l/text()">
<xsl:for-each-group select="$words" group-by=".">
<xsl:sort/>
<xsl:value-of select="word/@value"/> --
<xsl:for-each select="current-group()">
<a href="#{concat('poem',@poemnumber,'line',@linenumber)}">
<xsl:value-of select="@poemnumber"/>:<xsl:value-of
select="@linenumber"/></a>
</xsl:for-each>
</xsl:for-each-group>
</xsl:template>
<xsl:variable name="words">
<xsl:for-each select="tokenize(lower-case(string(translate(.,',.!:;','
'))),'\s+')[string(.)]">
<!-- How do I only match text in 'head' and 'l' elements? -->
<xsl:variable name="poemnumber">
<!-- How do I get poem number here? i.e. xsl:number
count="div[(_at_)type='poem'] when I was matching 'l' " -->
</xsl:variable>
<xsl:variable name="linenumber">
<!-- How do I get line number here? i.e. xsl:number
from="div[(_at_)type='poem'] when I was matching 'l'-->
</xsl:variable>
<word value="{.}" litposition="{position()}" poemnumber="$poemnumber"
linenumber="$linenumber"/>
</xsl:for-each>
</xsl:variable>
<!-- some of the things I don't want to match -->
<xsl:template match="teiHeader|foreign|p|milestone|gap"
priority="-1" />
------------------
Does that clarify my confuddled state of mind?
-James
---
Dr James Cummings, Oxford Text Archive, University of Oxford
James dot Cummings at oucs dot ox dot ac dot uk
CALL FOR PAPERS: Digital Medievalism (Kalamazoo) and
Early Drama (Leeds) see http://users.ox.ac.uk/~jamesc/cfp.html
--+------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--+--