xsl-list
[Top] [All Lists]

RE: [xsl] Taking flat XML and parsing into multi level nexted

2007-08-08 03:52:28

I have some horrible pre-generated source XML which is in this form:

<item>Item Name One</item>
<categoryStart>Category Name One</categoryStart> <item>Item 
Name Two</item> <item>Item Name Three</item> 
<categoryStart>Category Name Two</categoryStart> <item>Item 
Name Four</item> <categoryEnd>Category Name Two</categoryEnd> 
<item>Item Name Five</item> <categoryEnd>Category Name 
One</categoryEnd> <item>Item Name Six</item>

In XSLT 2.0:

<xsl:template name="do-grouping">
  <xsl:param name="input" as="element()*">
  <xsl:for-each-group select="*" group-starting-with="categoryStart">
    <xsl:for-each-group select="current-group()"
group-ending-with="categoryEnd">
    <xsl:choose>
      <xsl:when test="current-group()[1][self:categoryStart]">
        <group>
          <xsl:call-template name="do-grouping">
            <xsl:with-param select="current-group()[self::item]"/>
          </xsl:call-template>
        </group>
      </xsl:when>
      <xsl:when test="current-group()[self:categoryStart]">
          <xsl:call-template name="do-grouping">
            <xsl:with-param select="current-group()"/>
          </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:copy-of select="current-group()"/>
      </xsl:otherwise>
    </xsl:choose>
    </xsl:for-each-group>
  </xsl:for-each-group>
</xsl:template>

Not tested.

I'm afraid doing a 1.0 solution is pure masochism, so I'll leave that to
others.

Michael Kay
http://www.saxonica.com/
    


Now, in the destination XML, the categories are also items, 
which just indicate another level of nesting, and so the 
above needs to be transformed to something along these lines:

<item>
    <title>Item Name One</title>
</item>
<group>
    <title>Category Name One</title>
    <item>
        <title>Item Name Two</title>
    </item>
    <item>
        <title>Item Name Three</title>
    </item>
    <group>
            <title>Category Name Two</title>
            <item>
                <item>Item Name Four</item>
            </item>
    </group>
    <item>
        <title>Item Name Five</title>
    </item>
</group>
<item>
    <title>Item Name Five</title>
</item>

The way I began to approach this was to use a for-each and 
then a choose, opening the item tag when I found a 
categoryStart and closing on categoryEnd. But the parser 
complained about the XML not being well formed, even though 
it would have been as an end result.

So next I have tried to use a recursive call-template: something like:

<xsl:template name="parseCategoryItems">
    <xsl:param name="nodes" />
    <xsl:for-each select="$nodes">
        <xsl:choose>
            <xsl:when test="name() = 'item'">
                <item identifier="ITEM{position()}">
                    <title><xsl:value-of select="." /></title>
                </item>
            </xsl:when>
            <xsl:when test="name() = 'categoryStart'">
                <item identifier="CITEM{position()}">
                    <xsl:call-template name="parseCategoryItems">
                        <xsl:with-param name="nodes"
select="following-sibling::*[.!=??]" />
                    </xsl:call-template>
                </item>
            </xsl:when>
        </xsl:choose>
    </xsl:for-each>
</xsl:template>

All of this is being processed using VBscript in a word 
document, with version XSLT v1.0.

First off, I'm not sure how to stop at the correct category 
end. What I need to do when I recurse is select all the nodes 
between the current node, and its matching 'endCategory' 
node. Unfortunately because the source is completely flat, I 
can't use a normal axis selector. I sort of need to be able 
to say "select all following siblings *until* we see an 
endCategory with the same value as the current node". At the 
moment the best I amanaged was selecting all that were *not* 
a categoryEnd, which obviously includes those after.

Secondly, I need to *not* process nodes that have already been done.
For clarification, when I run what I have now it nests the 
items (all the following-siblings as I don't know how to 
select correctly) *and* it prints them again below the nested 
version. So I basically, is there a way to remove them from 
the loop I have when you return from the recursive call?

I've had to simplify the examples from what I really have, 
but if anyone can give me any hints on how to progress, 
including completely different approaches, then that would be 
fantastic, because I am currently out of ideas.

Many thanks,
Paul

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--