In XSLT 1.0 I would tackle this using the technique that I've started
referring to as "sibling recursion". The general pattern is:
(a) From the parent element do
<xsl:apply-templates select="child::node()[1]" mode="sibling-recursion"/>
(b) Write one or more templates that match the child elements; the structure
of these is:
<xsl:template match="xxx" mode="sibling-recursion">
... process this node ...
<xsl:apply-templates select="following-sibling::node()[1]"
mode="sibling-recursion">
... with-params ...
</xsl:apply-templates>
</xsl:template>
In 2.0 converting "text<br/>" to "<line>text</line>" is often conveniently
done using group-ending-with="br".
This doesn't by itself help with your problem of handling the irregularities
in your input data. I think that when you have such irregularities, it's
often best to write a multiphase transformation in which each phase tries to
make the structure a bit more regular, making it easier for subsequent
phases to do their work.
But I'm afraid these are only rough ideas - I don't have time to get
immersed in the detail of what looks quite a challenging problem.
Michael Kay
http://www.saxonica.com/
-----Original Message-----
From: James Cummings [mailto:cummings(_dot_)james(_at_)gmail(_dot_)com]
Sent: 03 August 2005 10:49
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Grouping text nodes
Hi there,
I have some XHTML I'm trying to transform to add more structure to it.
It is a copy of the Latin Vulgate Bible. Currently the XHTML looks
something like this:
-----
<div class="chapter">
<span class="chapter-num">1</span>
<div class="poetrystartchapter">
<span class="vn"
id="x1_1">1</span> Beatus vir qui
non abiit in consilio impiorum,<br/> et
in via peccatorum
non stetit,<br/> et in cathedra pestilentiæ
non sedit ;<br/>
<span class="vn"
id="x1_2">2</span> sed in lege
Domini voluntas ejus,<br/> et in lege
ejus meditabitur die
ac nocte.<br/>
<span class="vn"
id="x1_3">3</span> Et erit tamquam
lignum quod plantatum est secus decursus
aquarum,<br/> quod
fructum suum dabit in tempore
suo :<br/> et folium
ejus non defluet ;<br/> et omnia
quæcumque faciet prosperabuntur.<br/>
...</div>...</div>
-----
What I want to get is something like:
-----
<div type="chapter" n="1">
<milestone type="poetrystartchapter"/>
<lg xml:id="x1_1" n="1">
<l xml:id="x1_1-1">Beatus vir qui
non abiit in consilio impiorum,</l>
<l xml:id="x1_1-2">et in via peccatorum
non stetit,</l>
<l xml:id="x1_1-3">et in cathedra
pestilentiæ non sedit </l>
</lg>
<lg xml:id="x1_2" n="2">
<l xml:id="x1_2-1"> sed in lege Domini
voluntas ejus,</l>
<l xml:id="x1_2-2">et in lege ejus meditabitur die
ac nocte.</l>
</lg>
<lg xml:id="x1_3">
<l xml:id="x1_3-1"> Et erit tamquam
lignum quod plantatum est secus decursus
aquarum,</l>
<l xml:id="x1_3-2"> quod fructum suum dabit in
tempore suo :</l>
<l xml:id="x1_3-3"> et folium ejus non
defluet;</l>
<l xml:id="x1_3-4"> et omnia quæcumque
faciet prosperabuntur.</l>
</lg>
<milestone type="EndOfpoetrystartchapter"/>
...</div>
-----
My problem is when I'm looking backwards to create the @xml:id for
each of the lines whilst grouping the text nodes into lines.
Sometimes there is extra existing structure which seems to get in the
way, where the <div> (if present at all) starts after the first line
-----
<div class="chapter"><span class="chapter-num">118</span>
<span class="vn" id="x118_1">1</span> Alleluja.
<div class="poetry"><span
class="speaker">Aleph.</span> Beati
immaculati in via,<br/> qui ambulant in
lege Domini.<br/>
<span class="vn"
id="x118_2">2</span> Beati qui
scrutantur testimonia ejus ;<br/> in
toto corde
exquirunt eum.<br/>
-----
Which is supposed to come out something likelike:
-----
<div type="chapter" n="118">
<lg xml:id="x118_1" n="1">
<l xml:id="x118_1-1">Alleluja.
<milestone type="poetry"/>
<seg type="speaker">Aleph.</seg> Beati immaculati
in via,</l>
<l xml:id="x118_1-2"> qui ambulant in
lege Domini.</l>
</lg>
<lg>
<l xml:id="x118_2-1"> Beati qui scrutantur
testimonia ejus; </l>
<l xml:id="x118_2-2"> in toto corde
exquirunt eum.</l>
</lg>
<milestone type="Endofpoetry"/>
... </div>
-----
At the moment when matching text() to create the lines, I then look
back (preceding:: or preceding-sibling:: ) to the span grab the
span/@id to create the l/@xml:id... but in instances like psalm 118
where another div or span gets in the way it tends to muck up.
So I'm convinced there is probably an entirely better way to do this.
Any suggestions?
Many Thanks,
-James
--
James Cummings, Cummings dot James at GMail dot com
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--