xsl-list
[Top] [All Lists]

Re: Flattening parts of a document heirarchy

2003-10-31 05:03:59
Hi Dave,

incoming document is something like this:

<doc>
    text text text
    <sec id="sec1">
       <p>text1text1text1</p>
        <sec id="sec1.1">
           <p>text2 text2 text2</p>
        </sec>
       <p>text3 text3 text3</p>
   </sec>
</doc>

i.e <sec> tags are nested to arbitary level.

I need

<doc>
   text text text
   <section id="s1" level="1">
       <p>text1 text1 text1</p>
   </section>
   <section id="s2" level="2">
        <p>text text text</p>
   </section>
   <section id="s3" level="1">
       <p>text3 text3 text3</p>
   </section>
</doc>
[snip]
No replies to this, can XSLT really not do this? I've decided to
address this by pre-filtering using SAX where this kind of transform
is pretty easy.

If you're happy using SAX to do it, I think you should do so. This
kind of transformation is really suited to a streaming approach, in
which you go through the elements as they appear and insert start and
end tags as appropriate.

You *can* do it in XSLT, by simulating that streaming approach and
stepping through nodes one by one. When you come across a <sec>
element within the <doc> element, only apply templates to its first
child, in flatten mode:

<xsl:template match="sec">
  <xsl:apply-templates select="node()[1]" mode="flatten" />
</xsl:template>

In flatten mode, most nodes should create a <section> element, the
content of which will be the result of applying templates in copy mode
to the node itself. After the <section> element comes the result of
applying templates in flatten mode to the next <sec> element (elements
between this one and the <sec> element will be copied into the section
via the copy mode templates):

<xsl:template match="node()" mode="flatten">
  <section level="{count(ancestor::sec)}">
    <xsl:apply-templates select="." mode="copy" />
  </section>
  <xsl:apply-templates select="following-sibling::sec[1]"
                       mode="flatten" />
</xsl:template>

Processing of <sec> elements in flatten mode is similar to the
processing of <sec> elements in the normal mode: you apply templates
to the first child of the <sec> element in flatten mode, but then you
go on to process the next following sibling of the <sec> element:

<xsl:template match="sec" mode="flatten">
  <xsl:apply-templates select="node()[1]" mode="flatten" />
  <xsl:apply-templates select="following-sibling::node()[1]"
                       mode="flatten" />
</xsl:template>

Processing of most nodes in copy mode is to copy the node itself and
then move on to the next sibling node:

<xsl:template match="node()" mode="copy">
  <xsl:copy-of select="." />
  <xsl:apply-templates select="following-sibling::node()[1]"
                       mode="copy" />
</xsl:template>

Processing of <sec> elements in copy mode is to do nothing:

<xsl:template match="sec" mode="copy" />

Putting IDs that increment sequentially on the <section> elements
would require a different approach that gives me a headache; I'd do it
by post-processing the result of the above transformation to add the
IDs.

Things are a lot easier in XSLT 2.0, in which you can use the
group-adjacent attribute to create groups based on the id of the
parent <sec> element.

<xsl:template match="sec">
  <xsl:for-each-group select="descendant::node()
                                [parent::sec and not(self::sec)]"
                      group-adjacent="parent::sec/@id">
    <section id="s{position()}" level="{count(ancestor::sec)}">
      <xsl:copy-of select="current-group()" />
    </section>
  </xsl:for-each-group>
</xsl:template>

Note that this approach allows you to create the incrementing IDs very
easily as well.

Cheers,

Jeni

---
Jeni Tennison
http://www.jenitennison.com/


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list