xsl-list
[Top] [All Lists]

RE: Grouping text nodes

2005-08-03 03:56:59
In XSLT 1.0 I would tackle this using the technique that I've started
referring to as "sibling recursion". The general pattern is:

(a) From the parent element do

   <xsl:apply-templates select="child::node()[1]" mode="sibling-recursion"/>

(b) Write one or more templates that match the child elements; the structure
of these is:

<xsl:template match="xxx" mode="sibling-recursion">
   ... process this node ...
   <xsl:apply-templates select="following-sibling::node()[1]"
mode="sibling-recursion">
      ... with-params ...
   </xsl:apply-templates>
</xsl:template>

In 2.0 converting "text<br/>" to "<line>text</line>" is often conveniently
done using group-ending-with="br".


This doesn't by itself help with your problem of handling the irregularities
in your input data. I think that when you have such irregularities, it's
often best to write a multiphase transformation in which each phase tries to
make the structure a bit more regular, making it easier for subsequent
phases to do their work.

But I'm afraid these are only rough ideas - I don't have time to get
immersed in the detail of what looks quite a challenging problem.

Michael Kay
http://www.saxonica.com/


-----Original Message-----
From: James Cummings [mailto:cummings(_dot_)james(_at_)gmail(_dot_)com] 
Sent: 03 August 2005 10:49
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Grouping text nodes

Hi there,

I have some XHTML I'm trying to transform to add more structure to it.
 It is a copy of the Latin Vulgate Bible.  Currently the XHTML looks
something like this:
-----
<div class="chapter">
<span class="chapter-num">1</span>
        <div class="poetrystartchapter">
                    <span class="vn" 
id="x1_1">1</span>&nbsp;Beatus vir qui
                    non abiit in consilio impiorum,<br/> et 
in via peccatorum
                    non stetit,<br/> et in cathedra pestilenti&aelig;
non sedit&nbsp;;<br/>
                    <span class="vn" 
id="x1_2">2</span>&nbsp;sed in lege
                    Domini voluntas ejus,<br/> et in lege 
ejus meditabitur die
                    ac nocte.<br/>
                    <span class="vn" 
id="x1_3">3</span>&nbsp;Et erit tamquam
                    lignum quod plantatum est secus decursus 
aquarum,<br/> quod
                    fructum suum dabit in tempore 
suo&nbsp;:<br/> et folium
                    ejus non defluet&nbsp;;<br/> et omnia
                    qu&aelig;cumque faciet prosperabuntur.<br/>
...</div>...</div>
-----
What I want to get is something like:
-----
<div type="chapter" n="1">
             <milestone type="poetrystartchapter"/>
             <lg xml:id="x1_1" n="1">
                    <l xml:id="x1_1-1">Beatus vir qui
                    non abiit in consilio impiorum,</l>
                   <l xml:id="x1_1-2">et in via peccatorum 
non stetit,</l>
                    <l xml:id="x1_1-3">et in cathedra
pestilenti&aelig; non sedit </l>
              </lg>
              <lg xml:id="x1_2" n="2">
                    <l xml:id="x1_2-1"> sed in lege Domini 
voluntas ejus,</l>
                    <l xml:id="x1_2-2">et in lege ejus meditabitur die
ac nocte.</l>
               </lg>
                <lg xml:id="x1_3">
                     <l xml:id="x1_3-1"> Et erit tamquam
                    lignum quod plantatum est secus decursus 
aquarum,</l>
                    <l xml:id="x1_3-2"> quod fructum suum dabit in
tempore suo :</l>
                    <l xml:id="x1_3-3"> et folium ejus non 
defluet;</l>
                    <l xml:id="x1_3-4"> et omnia qu&aelig;cumque
faciet prosperabuntur.</l>
                     </lg>
<milestone type="EndOfpoetrystartchapter"/>
...</div>
-----
My problem is when I'm looking backwards to create the @xml:id for
each of the lines whilst grouping the text nodes into lines. 
Sometimes there is extra existing structure which seems to get in the
way, where the <div> (if present at all) starts after the first line

-----
 <div class="chapter"><span class="chapter-num">118</span>
                <span class="vn" id="x118_1">1</span>&nbsp;Alleluja. 
                    <div class="poetry"><span
class="speaker">Aleph.</span> Beati
                    immaculati in via,<br/> qui ambulant in 
lege Domini.<br/>
                    <span class="vn" 
id="x118_2">2</span>&nbsp;Beati qui
                    scrutantur testimonia ejus&nbsp;;<br/> in 
toto corde
                    exquirunt eum.<br/>
-----
Which is supposed to  come out something likelike:
-----
 <div type="chapter" n="118">
                <lg xml:id="x118_1" n="1">
                     <l xml:id="x118_1-1">Alleluja.
                      <milestone type="poetry"/>
                    <seg type="speaker">Aleph.</seg> Beati immaculati
in via,</l>
                     <l xml:id="x118_1-2"> qui ambulant in 
lege Domini.</l>
                 </lg>
                  <lg>
                     <l xml:id="x118_2-1"> Beati qui scrutantur
testimonia ejus; </l>
                     <l xml:id="x118_2-2"> in toto corde  
exquirunt eum.</l>
                  </lg>
                   <milestone type="Endofpoetry"/>
... </div>
-----
At the moment when matching  text() to create the lines, I then look
back (preceding:: or preceding-sibling:: ) to the span grab the
span/@id to create the l/@xml:id... but in instances like psalm 118
where another div or span gets in the way it tends to muck up.

So I'm convinced there is probably an entirely better way to do this. 
Any suggestions?

Many Thanks,
-James

-- 
James Cummings, Cummings dot James at GMail dot com

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--





--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--