I have a source document which uses a hierarchical to markup the structure
of the text of a manuscript (<div> for the big divisions and <p> for the
paragraphs) and milestone tags for page breaks (<pb>) and line breaks
(<lb>), which may occur in virtually any place inside the hierarchy, for
example:
<doc>
<pb n="1" />
<div>
<p>Line A
<lb/>Line B
<pb n="2" />
<lb/>Line C
</p>
<p>Line D
<lb/>Line E
<lb/>Line F
</p>
<pb n="3" />
<p>Line G
<lb/>Line H
<lb/>Line I
</p>
</div>
<div>
<p>Line J
<lb/>Line K
<lb/>Line L
</p>
</div>
</doc>
I would like to transform this document into a nested structure of <page>
and <line> tags and markup the textual divisions as milestones:
<doc>
<page n="1">
<newdiv/>
<newp/>
<line n="1.1">Line A</line>
<line n="1.2">Line B</line>
</page>
<page n="2">
<line n="2.1">Line C</line>
<newp/>
<line n="2.2">Line D</line>
<line n="2.3">Line E</line>
<line n="2.4">Line F</line>
</page>
<page n="3">
<newp/>
<line n="3.1">Line G</line>
<line n="3.2">Line H</line>
<line n="3.3">Line I</line>
<newdiv/>
<newp/>
<line n="3.4">Line J</line>
<line n="3.5">Line K</line>
<line n="3.6">Line L</line>
</page>
</doc>
What is the best strategy to do this? (My main problem is to get a
selection of nodes spanning between <pb> tags appearing on different levels
in the hierarchy.)
Dieter Köhler