xsl-list
[Top] [All Lists]

RE: how to remove duplicates from more than one file?

2002-12-09 02:44:28
Your code works fine with Saxon. For reference, here is the complete
stylesheet:

<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
<xsl:output method="xml" version="1.0" encoding="ISO-8859-1"
indent="yes"/>

<xsl:key name="items" match="item" use="@name"/>

<xsl:variable name="source">
     <xsl:copy-of select="document('test1.xml')//item | 
document('test2.xml')//item"/>
</xsl:variable>

<xsl:template match="/">
<xsl:for-each select="$source">
     <xsl:for-each 
select="//item[generate-id(.)=generate-id(key('items', @name)[1])]">
         <xsl:copy-of select="."/>
     </xsl:for-each>
</xsl:for-each>
</xsl:template>

</xsl:stylesheet>

I can't think of any better way of doing it in XSLT 1.0. Neither the
Muenchian approach nor the preceding-sibling approach to elimination of
duplicates can handle multiple documents directly. You could tune it
slightly by removing the "//". In 2.0, of course, you can use
xsl:for-each-group or the new distinct-values() function.

Michael Kay
Software AG
home: Michael(_dot_)H(_dot_)Kay(_at_)ntlworld(_dot_)com
work: Michael(_dot_)Kay(_at_)softwareag(_dot_)com 

-----Original Message-----
From: owner-xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com 
[mailto:owner-xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com] On Behalf Of 
Marcin Antczak
Sent: 07 December 2002 21:04
To: XSL-List(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] how to remove duplicates from more than one file?


My input is:

root.xml

<root/>


items_1.xml

<items>
      <item name='one'>value_1</item>
      <item name='one'>value_1</item>
      <item name='two'>value_2</item>
      <item name='three'>value_3</item>
</items>


items_2.xml (and more.... items_*.xml)

<items>
      <item name='one'>value_1</item>
      <item name='one'>value_1</item>
      <item name='two'>value_2</item>
      <item name='two'>value_2</item>
      <item name='one'>value_1</item>
      <item name='seven'>value_7</item>
</items>


And I need to generate output with items from all input files without 
duplicates:

<itemList>
      <item name='one'>value_1</item>
      <item name='two'>value_2</item>
      <item name='three'>value_3</item>
      <item name='seven'>value_7</item>
</itemList>

My first idea was to grab extrernal data with document() 
function into 
variable and then use Muenchian method on nodeset within this 
variable.

In my stylesheet I did something like this:

<xsl:key name="items" match="item" use="@name"/>

<xsl:variable name="source">
     <xsl:copy-of select="document('items_1.xml')//item | 
document('items_2.xml')//item"/>
</xsl:variable>

<xsl:for-each select="$source">
     <xsl:for-each 
select="//item[generate-id(.)=generate-id(key('items', @name)[1])]">
         <test_ok/>
     </xsl:for-each>
</xsl:for-each>

But on my windows machine (win 2000 + IIS 5.0 + PHP 4.2.3 + sablotron 
0.96 - server side transformations) i get only segfaults.

On unix machine (freeBSD) there was no errors but any output 
at all either.

Could you give me hint how to resolve this problem?


Marcin Antczak











 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list