xsl-list
[Top] [All Lists]

RE: Re: up-converting

2004-09-28 06:23:33
I haven't looked through your code in detail, but it looks similar to a
problem I used as an exercise at the Oxford Summer School. Here we had a set
of records with COBOL-like level numbers

<A level="1"/>
<B level="2"/>
<C level="3"/>
<D level="2"/>

and the task is to create a hierarchically nested structure. (The actual
input was a GEDCOM file).

the solution is a recursive grouping like this:

<xsl:template name="g">
 <xsl:param name="sequence" as="element()*"/>
 <xsl:param name="level" as="xs:integer"/>
 <xsl:for-each-group select="$sequence"
group-starting-with="*[(_at_)level=$level]">
  <xsl:copy>
    <xsl:call-template name="g">
      <xsl:with-param name="sequence" select="current-group() except ."/>
      <xsl:with-param name="level" select="$level+1"/>
    </
  </
 </
</

Now it seems to me your problem is very similar, except you have no explicit
level number. But I think you could use a similar approach, where the same
template is used for each level of grouping and the only thing that changes
is the grouping key.

Michael Kay
http://www.saxonica.com/
 

-----Original Message-----
From: Jim_Albright(_at_)wycliffe(_dot_)org 
[mailto:Jim_Albright(_at_)wycliffe(_dot_)org] 
Sent: 28 September 2004 13:07
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: RE: [xsl] Re: up-converting

I have a solution for the up-converting problem that I had. 
It isn't as 
elegant as I was hoping for. Maybe someone here can give me a 
few more 
pointers.
Thanks again for including the for each group structure as 
that makes the 
solution much easier.

My general problem is conversion of a flat XML (WordML) 
document to one 
with hierarchy.
After tossing out all of the formatting info,
The first step is to map all the paragraphs that indicate 
divs to their 
appropriate level. I use a, b, c, d, ... for new element 
names in order to 
make this more generic.
The a, b, c, indicate the head or title for the div.
The div  may  be nested: a, b, c, d.
Some divs may be omitted: a, b,  d.
Divs may be followed by other divs or paragraphs. Paragraphs 
may contain 
spans.

Next use the for-each-group structure to put a aa element 
around the a 
elements.
Next use the for-each-group structure to put a bb element 
around the b 
elements and aaa instead of aa.
...
Each of these steps builds the required hierarchy one step at a time. 
Since some divs may be omitted I couldn't find a way to combine these 
steps.
Next the head/title is pulled out.
Toss out any div with no head/title

sample input
<?xml version="1.0" encoding="UTF-8"?>
<document>
        <a>level aaaa head 1</a>
        <b>level bbbb head 2</b>
        <c>level ccccc head 3</c>
        <dfg>cc 4 blah</dfg>
        <e>level eeee head 5 </e>
        <fhh>cc blah 6</fhh>
        <c>level ccccc head 7</c>
        <df>cc 8 blah<kkk>kkk within df within c</kkk>
        </df>
        <d>level dddd head 9</d>
        <iuo>dd 10 blah</iuo>
        <jtt>dd blah 11</jtt>
        <c>level ccccc head 12</c>
        <df>cc 13 blah</df>
        <e>cc level eeeee  head 14</e>
        <fss>ee blah 15</fss>
        <b>level bbbbb head 16</b>
        <c>level ccccc  head 17</c>
        <df>cc 18  blah</df>
        <e>cc level eeeee head 19</e>
        <fhy>ee blah 20</fhy>
</document>

and the required output is
<?xml version="1.0" encoding="UTF-8"?>
<document>
   <div-a>
      <title>level aaaa head 1</title>
      <div-b>
         <title>level bbbb head 2</title>
         <div-c>
            <title>level ccccc head 3</title>
            <dfg>cc 4 blah</dfg>
            <div-e>
               <title>level eeee head 5 </title>
               <fhh>cc blah 6</fhh>
            </div-e>
         </div-c>
         <div-c>
            <title>level ccccc head 7</title>
            <df>cc 8 blah<kkk>kkk within df within c</kkk>
            </df>
            <div-d>
               <title>level dddd head 9</title>
               <iuo>dd 10 blah</iuo>
               <jtt>dd blah 11</jtt>
            </div-d>
         </div-c>
         <div-c>
            <title>level ccccc head 12</title>
            <df>cc 13 blah</df>
            <div-e>
               <title>cc level eeeee  head 14</title>
               <fss>ee blah 15</fss>
            </div-e>
         </div-c>
      </div-b>
      <div-b>
         <title>level bbbbb head 16</title>
         <div-c>
            <title>level ccccc  head 17</title>
            <df>cc 18  blah</df>
            <div-e>
               <title>cc level eeeee head 19</title>
               <fhy>ee blah 20</fhy>
            </div-e>
         </div-c>
      </div-b>
   </div-a>
</document>



Next use the for-each-group structure to put a aa element 
around the a 
elements.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
        <xsl:output method="xml" version="1.0" 
encoding="UTF-8" indent="yes"/>
        <xsl:template match="document">
                <document>
                        <xsl:for-each-group select="*" 
group-starting-with="a">
                                <aa>
                                        <xsl:for-each 
select="current-group()">
                                                <xsl:copy-of 
select="."/>
                                        </xsl:for-each>
                                </aa>
                        </xsl:for-each-group>
                </document>
        </xsl:template>
        <xsl:template match="@*|node()" name="copy-current-node">
                <xsl:copy>
                        <xsl:apply-templates select="@*|node()"/>
                </xsl:copy>
        </xsl:template>
</xsl:stylesheet>


with output of

<?xml version="1.0" encoding="UTF-8"?>
<document>
   <aa>
      <a>level aaaa head 1</a>
      <b>level bbbb head 2</b>
      <c>level ccccc head 3</c>
      <dfg>cc 4 blah</dfg>
      <e>level eeee head 5 </e>
      <fhh>cc blah 6</fhh>
      <c>level ccccc head 7</c>
      <df>cc 8 blah<kkk>kkk within df within c</kkk>
      </df>
      <d>level dddd head 9</d>
      <iuo>dd 10 blah</iuo>
      <jtt>dd blah 11</jtt>
      <c>level ccccc head 12</c>
      <df>cc 13 blah</df>
      <e>cc level eeeee  head 14</e>
      <fss>ee blah 15</fss>
      <b>level bbbbb head 16</b>
      <c>level ccccc  head 17</c>
      <df>cc 18  blah</df>
      <e>cc level eeeee head 19</e>
      <fhy>ee blah 20</fhy>
   </aa>
</document>

Next use the for-each-group structure to put a bb element 
around the b 
elements and aaa instead of aa.
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
        <xsl:output method="xml" version="1.0" 
encoding="UTF-8" indent="yes"/>
        <xsl:template match="document">
                <document>
                        <xsl:apply-templates/>
                </document>
        </xsl:template>
        <xsl:template match="aa">
                <aaa>
                        <xsl:for-each-group select="*" 
group-starting-with="b">
                                <bb>
                                        <xsl:for-each 
select="current-group()">
                                                <xsl:copy-of 
select="."/>
                                        </xsl:for-each>
                                </bb>
                        </xsl:for-each-group>
                </aaa>
        </xsl:template>
        <xsl:template match="@*|node()" name="copy-current-node">
                <xsl:copy>
                        <xsl:apply-templates select="@*|node()"/>
                </xsl:copy>
        </xsl:template>
</xsl:stylesheet>

<?xml version="1.0" encoding="UTF-8"?>
<document>
   <aaa>
      <bb>
         <a>level aaaa head 1</a>
      </bb>
      <bb>
         <b>level bbbb head 2</b>
         <c>level ccccc head 3</c>
         <dfg>cc 4 blah</dfg>
         <e>level eeee head 5 </e>
         <fhh>cc blah 6</fhh>
         <c>level ccccc head 7</c>
         <df>cc 8 blah<kkk>kkk within df within c</kkk>
 
         </df>
         <d>level dddd head 9</d>
         <iuo>dd 10 blah</iuo>
         <jtt>dd blah 11</jtt>
         <c>level ccccc head 12</c>
         <df>cc 13 blah</df>
         <e>cc level eeeee  head 14</e>
         <fss>ee blah 15</fss>
      </bb>
      <bb>
         <b>level bbbbb head 16</b>
         <c>level ccccc  head 17</c>
         <df>cc 18  blah</df>
         <e>cc level eeeee head 19</e>
         <fhy>ee blah 20</fhy>
      </bb>
   </aaa>
</document>


continue adding the levels
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
        <xsl:output method="xml" version="1.0" 
encoding="UTF-8" indent="yes"/>
        <xsl:template match="document">
                <document>
                        <xsl:apply-templates/>
                </document>
        </xsl:template>
        <xsl:template match="aaa">
                <aaa>
                        <xsl:apply-templates/>
                </aaa>
        </xsl:template>
        <xsl:template match="bb">
                <bbb>
                        <xsl:for-each-group select="*" 
group-starting-with="c">
                                <cc>
                                        <xsl:for-each 
select="current-group()">
                                                <xsl:copy-of 
select="."/>
                                        </xsl:for-each>
                                </cc>
                        </xsl:for-each-group>
                </bbb>
        </xsl:template>
        <xsl:template match="@*|node()" name="copy-current-node">
                <xsl:copy>
                        <xsl:apply-templates select="@*|node()"/>
                </xsl:copy>
        </xsl:template>
</xsl:stylesheet>

<?xml version="1.0" encoding="UTF-8"?>
<document>
   <aaa>
      <bbb>
         <cc>
            <a>level aaaa head 1</a>
         </cc>
      </bbb>
      <bbb>
         <cc>
            <b>level bbbb head 2</b>
         </cc>
         <cc>
            <c>level ccccc head 3</c>
            <dfg>cc 4 blah</dfg>
            <e>level eeee head 5 </e>
            <fhh>cc blah 6</fhh>
         </cc>
         <cc>
            <c>level ccccc head 7</c>
            <df>cc 8 blah<kkk>kkk within df within c</kkk>
 
 
            </df>
            <d>level dddd head 9</d>
            <iuo>dd 10 blah</iuo>
            <jtt>dd blah 11</jtt>
         </cc>
         <cc>
            <c>level ccccc head 12</c>
            <df>cc 13 blah</df>
            <e>cc level eeeee  head 14</e>
            <fss>ee blah 15</fss>
         </cc>
      </bbb>
      <bbb>
         <cc>
            <b>level bbbbb head 16</b>
         </cc>
         <cc>
            <c>level ccccc  head 17</c>
            <df>cc 18  blah</df>
            <e>cc level eeeee head 19</e>
            <fhy>ee blah 20</fhy>
         </cc>
      </bbb>
   </aaa>
</document>

....


finally at aaa see if there is a descendant a, if so that is 
the title for 
this group, otherwise no title

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
        <xsl:output method="xml" version="1.0" encoding="UTF-8" 
indent="yes"/>
        <xsl:strip-space elements="*"/>
        <xsl:template match="document">
                <document>
                        <xsl:apply-templates/>
                </document>
        </xsl:template>
        <xsl:template match="aaa">
                <div-a>
                        <xsl:choose>
                                <xsl:when test="descendant::a">
                                        <title>
                                                <xsl:apply-templates 
select="descendant::a"/>
                                        </title>
                                        <xsl:apply-templates 
select="child::*"/>
                                </xsl:when>
                                <xsl:otherwise>
                                        <xsl:apply-templates 
select="child::*"/>
                                </xsl:otherwise>
                        </xsl:choose>
                </div-a>
        </xsl:template>
        <xsl:template match="bbb">
                <xsl:choose>
                        <xsl:when test="descendant::b">
                                <div-b>
                                        <title>
                                                <xsl:apply-templates 
select="descendant::b"/>
                                        </title>
                                        <xsl:apply-templates 
select="child::*"/>
                                </div-b>
                        </xsl:when>
                        <xsl:otherwise>
                                <xsl:apply-templates 
select="child::*"/>
                        </xsl:otherwise>
                </xsl:choose>
        </xsl:template>
        <xsl:template match="ccc">
                <xsl:choose>
                        <xsl:when test="descendant::c">
                                <div-c>
                                        <title>
                                                <xsl:apply-templates 
select="descendant::c"/>
                                        </title>
                                        <xsl:apply-templates 
select="descendant::*[preceding-sibling::c]"/>
                                        <xsl:apply-templates 
select="child::*"/>
                                </div-c>
                        </xsl:when>
                        <xsl:otherwise>
                                <xsl:apply-templates 
select="child::*[not(c)]|descendant::*[preceding-sibling::c]"/>
                        </xsl:otherwise>
                </xsl:choose>
        </xsl:template>
        <xsl:template match="ddd">
                <xsl:choose>
                        <xsl:when test="descendant::d">
                                <div-d>
                                        <title>
                                                <xsl:apply-templates 
select="descendant::d"/>
                                        </title>
                                        <xsl:apply-templates 
select="descendant::*[preceding-sibling::d]"/>
                                        <xsl:apply-templates 
select="child::*"/>
                                </div-d>
                        </xsl:when>
                        <xsl:otherwise>
                                <xsl:apply-templates 
select="child::*|descendant::*[preceding-sibling::d]"/>
                        </xsl:otherwise>
                </xsl:choose>
        </xsl:template>
        <xsl:template match="eee">
                <xsl:choose>
                        <xsl:when test="descendant::e">
                                <div-e>
                                        <title>
                                                <xsl:apply-templates 
select="descendant::e"/>
                                        </title>
                                        <xsl:apply-templates 
select="descendant::*[preceding-sibling::e]"/>
                                        <xsl:apply-templates 
select="child::*"/>
                                </div-e>
                        </xsl:when>
                        <xsl:otherwise>
                                <xsl:apply-templates 
select="child::*|descendant::*[preceding-sibling::e]"/>
                        </xsl:otherwise>
                </xsl:choose>
        </xsl:template>

        <xsl:template match="a|b|c|d|e|f|g|h|i">
                <xsl:apply-templates/>
        </xsl:template>
        <xsl:template match="@*|node()" >
                <xsl:copy>
                        <xsl:apply-templates select="@*|node()"/>
                </xsl:copy>
        </xsl:template>
</xsl:stylesheet>

and then we can get rid of divs that have no title. Thus solving the 
missing div problem.



Jim Albright
704 843-0582
Wycliffe Bible Translators




--+------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--+--




<Prev in Thread] Current Thread [Next in Thread>