xsl-list
[Top] [All Lists]

Re: [xsl] How to do this tricky elimination on XML using XSLT 2.0?

2012-06-19 09:58:00
Dear Dr. Kay,
Thank your for your guide.
I modified the solution into:

 <xsl:variable name="removed-nodes" as="element(*)*">
        <xsl:for-each-group select="//blockA/*" group-by="concat(@id,
'~', @method, '~', otherchild)">
            <xsl:sequence select="subsequence(current-group(), 2)"/>
        </xsl:for-each-group>
    </xsl:variable>

    <xsl:template match="@* | node()">
        <xsl:if test="empty(. intersect $removed-nodes)">
            <xsl:copy>
                <xsl:copy-of select="@*"/>
                <xsl:apply-templates/>
            </xsl:copy>
        </xsl:if>
    </xsl:template>

It's almost correct just need to address two things:

1. Everytime a successive node with the `same id` has `different method`,
   the `boundary` for the next removal for that `id` is reset.
2. The removal cannot combine two different ancestor (<gridA id="1">
and <gridA id="2">)

**for example:**

    <elem id="1" method="a" />
    <elem id="1" method="a" /> <!-- this is repetitive for elem id=1
and will be removed -->
    <elem id="1" method="b" />
    <elem id="1" method="a" /> <!-- this is the new boundary for
removal elem id=1 and will not be removed -->
    <elem id="2" method="a" />
    <elem id="1" method="a" /> <!-- this is repetitive for elem id=1
and will be removed -->
    <elem id="2" method="a" /> <!-- this is repetitive for elem id=2
and will be removed -->

**will be simplified into:**

    <elem id="1" method="a" />
    <elem id="1" method="b" />
    <elem id="1" method="a" /> <!-- this is the new boundary for
removal elem id=1 and will not be removed -->
    <elem id="2" method="a" />


Please let me know how I can achieve such things. Thanks very much once again.



On Tue, Jun 19, 2012 at 9:20 PM, Michael Kay <mike(_at_)saxonica(_dot_)com> 
wrote:
I think I would tackle this in two passes. First use xsl:for-each-group to
identify the nodes to be removed; then do a modified identity transform that
retains only the nodes not in this list.

The first pass is something like this:

<!--  **Two node that have the same `name` and `id` will be considered
*repetitive* if it appears one after another and it has the same `method`
and `children`.** -->
<xsl:variable name="removed-nodes" as="element(*)*">
<xsl:for-each-group select="//blockA/*" group-by="concat(@id, '~', @method,
'~', otherchild)">
<xsl:sequence select="subsequence(current-group(), 2)"/>
</xsl:for-each-group>
</xsl:variable>

The second pass is:

<xsl:template match="*">
<xsl:if test="empty(. intersect $removed-nodes)">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:if>
</xsl:template>

Michael Kay
Saxonica


On 19/06/2012 10:14, Jo Na wrote:

Hi,
I have this input xml:
    <map>
        <region>
            <gridA id="1">
                <blockA id="01" method="build">
                    <building1 id="x" method="build">
                        <otherchild>a</otherchild>
                    </building1>
                    <building1 id="x" method="build">  <!-- this one
will be removed -->
                        <otherchild>a</otherchild>
                    </building1>
                </blockA>

                <blockA id="01">
                    <building1 id="x" method="modify">
                        <otherchild>a</otherchild>
                    </building1>
                    <building1 id="x" method="build">  <!-- this one
will be kept (prev node have same id but diff method so it's not
considered as successive -->
                        <otherchild>a</otherchild>
                    </building1>
                </blockA>

                <blockA id="02">
                    <building3 id="y" method="modify">
                        <otherchild>b</otherchild>
                    </building3>
                    <building2 id="x" method="demolish"/>
                </blockA>

                <blockA id="01">
                    <building1 id="y" method="build">  <!-- this one
will be kept (diff id) -->
                        <otherchild>a</otherchild>
                    </building1>
                    <building1 id="x" method="build">  <!-- this one
will be removed -->
                        <otherchild>a</otherchild>
                    </building1>
                </blockA>

                <blockA id="02">
                    <building3 id="y" method="modify">  <!-- this one
will be removed -->
                        <otherchild>b</otherchild>
                    </building3>
                    <building2 id="x" method="demolish"/>  <!-- this
one will be removed -->
                </blockA>
            </gridA>

            <gridA id="2">
                <blockA id="01" method="build">
                    <building1 id="x" method="build">
                        <otherchild>a</otherchild>
                    </building1>
                    <building1 id="x" method="build">  <!-- this one
will be removed -->
                        <otherchild>a</otherchild>
                    </building1>
                    <building1 id="x" method="build">  <!-- this one
will be kept (diff children) -->
                        <otherchild>b</otherchild>
                    </building1>
                </blockA>
                <blockA id="01">
                    <building1 id="x" method="build">  <!-- this one
will be removed -->
                        <otherchild>b</otherchild>
                    </building1>
                </blockA>
            </gridA>
            <gridB id="1">
                ...and so on..
            </gridB>
        </region>
    </map>

Expected Output:

    <map>
        <region>
            <gridA id="1">
                <blockA id="01" method="build">
                    <building1 id="x" method="build">
                        <otherchild>a</otherchild>
                    </building1>
                </blockA>

                <blockA id="01">
                    <building1 id="x" method="modify">
                        <otherchild>a</otherchild>
                    </building1>
                    <building1 id="x" method="build">  <!-- this one
will be kept (prev node have same id but diff method so it's not
considered as successive -->
                        <otherchild>a</otherchild>
                    </building1>
                </blockA>

                <blockA id="02">
                    <building3 id="y" method="modify">
                        <otherchild>b</otherchild>
                    </building3>
                    <building2 id="x" method="demolish"/>
                </blockA>

                <blockA id="01">
                    <building1 id="y" method="build">  <!-- this one
will be kept (diff id) -->
                        <otherchild>a</otherchild>
                    </building1>
                </blockA>

                <blockA id="02"/>
            </gridA>

            <gridA id="2">
                <blockA id="01" method="build">
                    <building1 id="x" method="build">
                        <otherchild>a</otherchild>
                    </building1>

                    <building1 id="x" method="build">  <!-- this one
will be kept (diff children) -->
                        <otherchild>b</otherchild>
                    </building1>
                </blockA>
                <blockA id="01"/>
            </gridA>
            <gridB id="1">
                ...and so on..
            </gridB>
        </region>
    </map>
The XSLT so far:

    <xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
        <xsl:output indent="yes"/>  <xsl:strip-space elements="*"/>

        <xsl:template match="node()|@*">
            <xsl:copy>
                <xsl:apply-templates select="node()|@*"/>
            </xsl:copy>
        </xsl:template>

        <xsl:template match="region/*/*/*
             [deep-equal(.,preceding::*[name()=current()/name()]
                           [@id = current()/@id]
                           [../../@id = current()/../../@id][1])]" />
    </xsl:stylesheet>

the problem with the XSLT right now is that it cannot differentiate
duplicates that happens in siblings (i.e blockA with the same id).

I need to remove a node that are considered as *repetitive*.

**Two node that have the same `name` and `id` will be considered
*repetitive* if it appears one after another and it has the same
`method` and `children`.**

**for example:**

    <elem id="1" method="a" />
    <elem id="1" method="a" />  <!-- this is repetitive for id=1-->
    <elem id="1" method="b" />
    <elem id="1" method="a" />  <!-- this is the new boundary for removal
id=1-->
    <elem id="2" method="a" />
    <elem id="1" method="a" />  <!-- this is repetitive for id=1 -->
    <elem id="2" method="a" />  <!-- this is repetitive for id=2 -->

**will be simplified into:**

    <elem id="1" method="a" />
    <elem id="1" method="b" />
    <elem id="1" method="a" />  <!-- this is the new boundary for removal
id=1-->
    <elem id="2" method="a" />

 **- Everytime a successive node with the `same id` has `different
method`,
   the `boundary` for the next removal for that `id` is reset.**

 - we need to take into account duplicates that are under one parent
or siblings (two or more parents nodes that has the same element name
and id) i.e (in example: `blockX`)
 - if the two nodes being compared did not share the same `gridX`
level, then they should not be considered as duplicates to be removed

Please let me know how to achieve such transformation using XSLT 2.0.
Thanks very much for the help.

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or 
e-mail:<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


<Prev in Thread] Current Thread [Next in Thread>