RE: [xsl] removing duplicate elements based on two or more childs


In any question involving grouping, you need to make it clear whether you
are working with XSLT 1.0 or 2.0. It will of course be much easier with 2.0.

However, neither the 2.0 xsl:for-each-group, nor the Muenchian grouping
technique which is used with XSLT 1.0, make it easy to work with a variable
number of grouping keys. The preferred way of doing that is by recursion,
grouping first one key, then on the next, and so on. Alternatively you could
consider doing it by writing a function that computes a single (composite)
grouping key as some kind of string with internal structure.

Regards,

Michael Kay
http://www.saxonica.com/
http://twitter.com/michaelhkay

-----Original Message-----
From: Manuel Souto Pico [mailto:manuel(_dot_)souto(_at_)star-group(_dot_)net] 
Sent: 25 August 2009 15:46
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] removing duplicate elements based on two or more childs

Hi,

I had tried to get this done already some long ago, but due 
to lack of time and the difficulty of the task, I dropped it 
and did it by other less elegant means. Now I neede it again 
and I think it deserves a chance, with a bit of help I'm sure 
I'll manage to get it done and it will be extremely useful 
both for me and I guess for a lot of people.

A simplified explanation. In an XML file I have records, 
which contain languages, which contain terms. The path down 
to any term would be /doc/body/text/record/lang/term. For example:

<record id="1">
    <lang id="fra">
        <term>banque</term>
    </lang>
    <lang id="eng">
        <term>bank</term>
    </lang>
</record>
<record id="2">
    <lang id="fra">
        <term>banque</term>
    </lang>
    <lang id="eng">
        <term>bench</term>
    </lang>
</record>
<record id="3">
    <lang id="fra">
        <term>banque</term>
    </lang>
    <lang id="eng">
        <term>bank</term>
    </lang>
</record

As you can see, the French term is the same in the three 
records. If we applied a duplicate removing function based on 
the French term, we would end up with only one record. 
However, what I need is to remove duplicates taking into 
account the terms in all languages, here only two (French and 
English), but it should be extensible to n languages.

The expected outcome would contain only records 1 and 3 (that 
is, two distinct concepts, the bench and the bank). Notice 
that it is the whole parent element record which must not be 
generated if the terms that it contains are duplicates, not 
just the children, so this is not the ideal
outcome:

<record id="1">
    <lang id="fra">
        <term>banque</term>
    </lang>
    <lang id="eng">
        <term>bank</term>
    </lang>
</record>
<record id="2">
    <lang id="fra">
        <term>banque</term>
    </lang>
    <lang id="eng">
        <term>bench</term>
    </lang>
</record>
<record id="3">
    <lang id="fra"/>
    <lang id="eng"/>
</record

I've tried using <xsl:apply-templates select="child::seg[not 
(. = preceding-sibling::seg)]" /> but, say, 
/record[(_at_)id=1]/lang[(_at_)id="fra"]/term and 
/record[(_at_)id=3]/lang[(_at_)id="fra"]/term are not siblings because 
they don't have the same parent. Perhaps this wasn't the best 
way to go anyway.

Any suggestion would be more than welcome. Thanks a lot.

Have nice evening,

--
Manuel Souto Pico

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--