I have a solution for the up-converting problem that I had. It isn't as
elegant as I was hoping for. Maybe someone here can give me a few more
pointers.
Thanks again for including the for each group structure as that makes the
solution much easier.
My general problem is conversion of a flat XML (WordML) document to one
with hierarchy.
After tossing out all of the formatting info,
The first step is to map all the paragraphs that indicate divs to their
appropriate level. I use a, b, c, d, ... for new element names in order to
make this more generic.
The a, b, c, indicate the head or title for the div.
The div may be nested: a, b, c, d.
Some divs may be omitted: a, b, d.
Divs may be followed by other divs or paragraphs. Paragraphs may contain
spans.
Next use the for-each-group structure to put a aa element around the a
elements.
Next use the for-each-group structure to put a bb element around the b
elements and aaa instead of aa.
...
Each of these steps builds the required hierarchy one step at a time.
Since some divs may be omitted I couldn't find a way to combine these
steps.
Next the head/title is pulled out.
Toss out any div with no head/title
sample input
<?xml version="1.0" encoding="UTF-8"?>
<document>
<a>level aaaa head 1</a>
<b>level bbbb head 2</b>
<c>level ccccc head 3</c>
<dfg>cc 4 blah</dfg>
<e>level eeee head 5 </e>
<fhh>cc blah 6</fhh>
<c>level ccccc head 7</c>
<df>cc 8 blah<kkk>kkk within df within c</kkk>
</df>
<d>level dddd head 9</d>
<iuo>dd 10 blah</iuo>
<jtt>dd blah 11</jtt>
<c>level ccccc head 12</c>
<df>cc 13 blah</df>
<e>cc level eeeee head 14</e>
<fss>ee blah 15</fss>
<b>level bbbbb head 16</b>
<c>level ccccc head 17</c>
<df>cc 18 blah</df>
<e>cc level eeeee head 19</e>
<fhy>ee blah 20</fhy>
</document>
and the required output is
<?xml version="1.0" encoding="UTF-8"?>
<document>
<div-a>
<title>level aaaa head 1</title>
<div-b>
<title>level bbbb head 2</title>
<div-c>
<title>level ccccc head 3</title>
<dfg>cc 4 blah</dfg>
<div-e>
<title>level eeee head 5 </title>
<fhh>cc blah 6</fhh>
</div-e>
</div-c>
<div-c>
<title>level ccccc head 7</title>
<df>cc 8 blah<kkk>kkk within df within c</kkk>
</df>
<div-d>
<title>level dddd head 9</title>
<iuo>dd 10 blah</iuo>
<jtt>dd blah 11</jtt>
</div-d>
</div-c>
<div-c>
<title>level ccccc head 12</title>
<df>cc 13 blah</df>
<div-e>
<title>cc level eeeee head 14</title>
<fss>ee blah 15</fss>
</div-e>
</div-c>
</div-b>
<div-b>
<title>level bbbbb head 16</title>
<div-c>
<title>level ccccc head 17</title>
<df>cc 18 blah</df>
<div-e>
<title>cc level eeeee head 19</title>
<fhy>ee blah 20</fhy>
</div-e>
</div-c>
</div-b>
</div-a>
</document>
Next use the for-each-group structure to put a aa element around the a
elements.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="document">
<document>
<xsl:for-each-group select="*" group-starting-with="a">
<aa>
<xsl:for-each select="current-group()">
<xsl:copy-of select="."/>
</xsl:for-each>
</aa>
</xsl:for-each-group>
</document>
</xsl:template>
<xsl:template match="@*|node()" name="copy-current-node">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
with output of
<?xml version="1.0" encoding="UTF-8"?>
<document>
<aa>
<a>level aaaa head 1</a>
<b>level bbbb head 2</b>
<c>level ccccc head 3</c>
<dfg>cc 4 blah</dfg>
<e>level eeee head 5 </e>
<fhh>cc blah 6</fhh>
<c>level ccccc head 7</c>
<df>cc 8 blah<kkk>kkk within df within c</kkk>
</df>
<d>level dddd head 9</d>
<iuo>dd 10 blah</iuo>
<jtt>dd blah 11</jtt>
<c>level ccccc head 12</c>
<df>cc 13 blah</df>
<e>cc level eeeee head 14</e>
<fss>ee blah 15</fss>
<b>level bbbbb head 16</b>
<c>level ccccc head 17</c>
<df>cc 18 blah</df>
<e>cc level eeeee head 19</e>
<fhy>ee blah 20</fhy>
</aa>
</document>
Next use the for-each-group structure to put a bb element around the b
elements and aaa instead of aa.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="document">
<document>
<xsl:apply-templates/>
</document>
</xsl:template>
<xsl:template match="aa">
<aaa>
<xsl:for-each-group select="*" group-starting-with="b">
<bb>
<xsl:for-each select="current-group()">
<xsl:copy-of select="."/>
</xsl:for-each>
</bb>
</xsl:for-each-group>
</aaa>
</xsl:template>
<xsl:template match="@*|node()" name="copy-current-node">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
<?xml version="1.0" encoding="UTF-8"?>
<document>
<aaa>
<bb>
<a>level aaaa head 1</a>
</bb>
<bb>
<b>level bbbb head 2</b>
<c>level ccccc head 3</c>
<dfg>cc 4 blah</dfg>
<e>level eeee head 5 </e>
<fhh>cc blah 6</fhh>
<c>level ccccc head 7</c>
<df>cc 8 blah<kkk>kkk within df within c</kkk>
</df>
<d>level dddd head 9</d>
<iuo>dd 10 blah</iuo>
<jtt>dd blah 11</jtt>
<c>level ccccc head 12</c>
<df>cc 13 blah</df>
<e>cc level eeeee head 14</e>
<fss>ee blah 15</fss>
</bb>
<bb>
<b>level bbbbb head 16</b>
<c>level ccccc head 17</c>
<df>cc 18 blah</df>
<e>cc level eeeee head 19</e>
<fhy>ee blah 20</fhy>
</bb>
</aaa>
</document>
continue adding the levels
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="document">
<document>
<xsl:apply-templates/>
</document>
</xsl:template>
<xsl:template match="aaa">
<aaa>
<xsl:apply-templates/>
</aaa>
</xsl:template>
<xsl:template match="bb">
<bbb>
<xsl:for-each-group select="*" group-starting-with="c">
<cc>
<xsl:for-each select="current-group()">
<xsl:copy-of select="."/>
</xsl:for-each>
</cc>
</xsl:for-each-group>
</bbb>
</xsl:template>
<xsl:template match="@*|node()" name="copy-current-node">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
<?xml version="1.0" encoding="UTF-8"?>
<document>
<aaa>
<bbb>
<cc>
<a>level aaaa head 1</a>
</cc>
</bbb>
<bbb>
<cc>
<b>level bbbb head 2</b>
</cc>
<cc>
<c>level ccccc head 3</c>
<dfg>cc 4 blah</dfg>
<e>level eeee head 5 </e>
<fhh>cc blah 6</fhh>
</cc>
<cc>
<c>level ccccc head 7</c>
<df>cc 8 blah<kkk>kkk within df within c</kkk>
</df>
<d>level dddd head 9</d>
<iuo>dd 10 blah</iuo>
<jtt>dd blah 11</jtt>
</cc>
<cc>
<c>level ccccc head 12</c>
<df>cc 13 blah</df>
<e>cc level eeeee head 14</e>
<fss>ee blah 15</fss>
</cc>
</bbb>
<bbb>
<cc>
<b>level bbbbb head 16</b>
</cc>
<cc>
<c>level ccccc head 17</c>
<df>cc 18 blah</df>
<e>cc level eeeee head 19</e>
<fhy>ee blah 20</fhy>
</cc>
</bbb>
</aaa>
</document>
....
finally at aaa see if there is a descendant a, if so that is the title for
this group, otherwise no title
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8"
indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="document">
<document>
<xsl:apply-templates/>
</document>
</xsl:template>
<xsl:template match="aaa">
<div-a>
<xsl:choose>
<xsl:when test="descendant::a">
<title>
<xsl:apply-templates
select="descendant::a"/>
</title>
<xsl:apply-templates
select="child::*"/>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates
select="child::*"/>
</xsl:otherwise>
</xsl:choose>
</div-a>
</xsl:template>
<xsl:template match="bbb">
<xsl:choose>
<xsl:when test="descendant::b">
<div-b>
<title>
<xsl:apply-templates
select="descendant::b"/>
</title>
<xsl:apply-templates
select="child::*"/>
</div-b>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="child::*"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="ccc">
<xsl:choose>
<xsl:when test="descendant::c">
<div-c>
<title>
<xsl:apply-templates
select="descendant::c"/>
</title>
<xsl:apply-templates
select="descendant::*[preceding-sibling::c]"/>
<xsl:apply-templates
select="child::*"/>
</div-c>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates
select="child::*[not(c)]|descendant::*[preceding-sibling::c]"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="ddd">
<xsl:choose>
<xsl:when test="descendant::d">
<div-d>
<title>
<xsl:apply-templates
select="descendant::d"/>
</title>
<xsl:apply-templates
select="descendant::*[preceding-sibling::d]"/>
<xsl:apply-templates
select="child::*"/>
</div-d>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates
select="child::*|descendant::*[preceding-sibling::d]"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="eee">
<xsl:choose>
<xsl:when test="descendant::e">
<div-e>
<title>
<xsl:apply-templates
select="descendant::e"/>
</title>
<xsl:apply-templates
select="descendant::*[preceding-sibling::e]"/>
<xsl:apply-templates
select="child::*"/>
</div-e>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates
select="child::*|descendant::*[preceding-sibling::e]"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="a|b|c|d|e|f|g|h|i">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="@*|node()" >
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
and then we can get rid of divs that have no title. Thus solving the
missing div problem.
Jim Albright
704 843-0582
Wycliffe Bible Translators