xsl-list
[Top] [All Lists]

[xsl] Taking flat XML and parsing into multi level nexted

2007-08-08 01:15:26
I have some horrible pre-generated source XML which is in this form:

<item>Item Name One</item>
<categoryStart>Category Name One</categoryStart>
<item>Item Name Two</item>
<item>Item Name Three</item>
<categoryStart>Category Name Two</categoryStart>
<item>Item Name Four</item>
<categoryEnd>Category Name Two</categoryEnd>
<item>Item Name Five</item>
<categoryEnd>Category Name One</categoryEnd>
<item>Item Name Six</item>

Now, in the destination XML, the categories are also items, which just
indicate another level of nesting, and so the above needs to be
transformed to something along these lines:

<item>
    <title>Item Name One</title>
</item>
<group>
    <title>Category Name One</title>
    <item>
        <title>Item Name Two</title>
    </item>
    <item>
        <title>Item Name Three</title>
    </item>
    <group>
            <title>Category Name Two</title>
            <item>
                <item>Item Name Four</item>
            </item>
    </group>
    <item>
        <title>Item Name Five</title>
    </item>
</group>
<item>
    <title>Item Name Five</title>
</item>

The way I began to approach this was to use a for-each and then a
choose, opening the item tag when I found a categoryStart and closing
on categoryEnd. But the parser complained about the XML not being well
formed, even though it would have been as an end result.

So next I have tried to use a recursive call-template: something like:

<xsl:template name="parseCategoryItems">
    <xsl:param name="nodes" />
    <xsl:for-each select="$nodes">
        <xsl:choose>
            <xsl:when test="name() = 'item'">
                <item identifier="ITEM{position()}">
                    <title><xsl:value-of select="." /></title>
                </item>
            </xsl:when>
            <xsl:when test="name() = 'categoryStart'">
                <item identifier="CITEM{position()}">
                    <xsl:call-template name="parseCategoryItems">
                        <xsl:with-param name="nodes"
select="following-sibling::*[.!=??]" />
                    </xsl:call-template>
                </item>
            </xsl:when>
        </xsl:choose>
    </xsl:for-each>
</xsl:template>

All of this is being processed using VBscript in a word document, with
version XSLT v1.0.

First off, I'm not sure how to stop at the correct category end. What
I need to do when I recurse is select all the nodes between the
current node, and its matching 'endCategory' node. Unfortunately
because the source is completely flat, I can't use a normal axis
selector. I sort of need to be able to say "select all following
siblings *until* we see an endCategory with the same value as the
current node". At the moment the best I amanaged was selecting all
that were *not* a categoryEnd, which obviously includes those after.

Secondly, I need to *not* process nodes that have already been done.
For clarification, when I run what I have now it nests the items (all
the following-siblings as I don't know how to select correctly) *and*
it prints them again below the nested version. So I basically, is
there a way to remove them from the loop I have when you return from
the recursive call?

I've had to simplify the examples from what I really have, but if
anyone can give me any hints on how to progress, including completely
different approaches, then that would be fantastic, because I am
currently out of ideas.

Many thanks,
Paul

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--