Michael,
I haven't understood your logic in any detail, but I wonder if it
suggests an alternative approach to the problem: namely, avoid creating
RTFs entirely, at least for intermediate results. Instead, whenever you
are evaluating an operation that returns a node-set, represent that
node-set as a string containing the generate-id values of the nodes in
the node-set, space-separated. Elimination of duplicates then reduces to
an operation on strings: not trivial, but not especially difficult
either.
that is a cool idea.
And reading your suggestion of white space separated list of stings I
thought on the id() function.
This function can do the duplicate elimination "for free" !
Having a document with DOCTYPE/ID and a white space separated string of
id's the call of id() with that string does not only return all the nodes
with the given id's -- it also does the duplicate node elimination ...
I figured out how to create the DOCTYPE definition while creating output
by xsl:text. Generating such an output XML file works perfect as can be
seen in the demo idc.xsl [1] and below.
File idc2.xml is the output generated by calling template idcopy for file
simple2.xml.
The big question now is, whether exslt:node-set() supports DOCTYPE
definitions and how. idc.xsl shows an attempt which does not work.
Accessing an element by its id works for document('idc2.xml') but
does not work for document(exslt:node-set($rtf)) although both are
generated identically by a call to template idcopy.
The difference seem to be the parsing from file idc2.xml ...
Is DOCTYPE supported by exslt:node-set()?
Is the generation of DOCTYPE by <xsl:text> OK for this purpose?
Can using id() function be made working for duplicate elimination
somehow differently?
$ xsltproc idc.xsl simple2.xml
----------
<node id="id2335172" type="text" value="4"/>
$ cat simple2.xml
<a>
<b>
<c>1</c>
<c>2</c>
</b>
<b>
<c>3</c>
<c>4</c>
</b>
</a>
$ cat idc2.xml
<!DOCTYPE node [ <!ATTLIST node id ID #REQUIRED> ]>
<node id="id2335401" type="element" name="a"><node id="id2335402"
type="text" value=" "/><node id="id2335404" type="element"
name="b"><node id="id2335405" type="text" value=" "/><node
id="id2335406" type="element" name="c"><node id="id2335407" type="text"
value="1"/></node><node id="id2335408" type="text" value=" "/><node
id="id2335409" type="element" name="c"><node id="id2335162" type="text"
value="2"/></node><node id="id2335163" type="text" value=" "/>
</node><node id="id2335164" type="text" value=" "/><node
id="id2335165" type="element" name="b"><node id="id2335166" type="text"
value=" "/><node id="id2335167" type="element" name="c"><node
id="id2335168" type="text" value="3"/></node><node id="id2335169"
type="text" value=" "/><node id="id2335170" type="element"
name="c"><node id="id2335172" type="text" value="4"/></node><node
id="id2335173" type="text" value=" "/></node><node id="id2335174"
type="text" value=" "/></node>
$
$ cat idc.xsl
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exslt="http://exslt.org/common"
exclude-result-prefixes="exslt"
<xsl:output omit-xml-declaration="yes"/>
<xsl:key name="nodes-by-id" match="node()" use="@id"/>
<xsl:template match="/">
<xsl:variable name="rtf">
<xsl:call-template name="idcopy"/>
</xsl:variable>
<xsl:variable name="id1" select=
"string(exslt:node-set($rtf)//node[(_at_)type='text'][(_at_)value='4']/@id)"/>
<xsl:for-each select="document(exslt:node-set($rtf))">
<xsl:copy-of select="id($id1)"/>
</xsl:for-each>
<xsl:text> ---------- </xsl:text>
<xsl:variable name="id2" select=
"string(document('idc2.xml')//node[(_at_)type='text'][(_at_)value='4']/@id)"/>
<xsl:for-each select="document('idc2.xml')">
<xsl:copy-of select="id($id2)"/>
</xsl:for-each>
</xsl:template>
<xsl:template name="idcopy">
<xsl:text disable-output-escaping="yes">
<![CDATA[<!DOCTYPE node [ <!ATTLIST node id ID #REQUIRED> ]>]]>
</xsl:text>
<xsl:choose>
<xsl:when test="count(. | ../namespace::*) !=
count(../namespace::*)">
<xsl:apply-templates select="." mode="idcopy"/>
</xsl:when>
<xsl:otherwise>
<node id="{generate-id()}" type="namespace"
name="{name()}" value="{.}"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="@*" mode="idcopy">
<node id="{generate-id()}" type="attribute"
name="{name()}" value="{.}"/>
</xsl:template>
<xsl:template match="node()" mode="idcopy">
<node id="{generate-id()}" type="element" name="{name()}">
<xsl:apply-templates select="@*" mode="idcopy"/>
<xsl:for-each select="namespace::*">
<xsl:if test="not(.=../../namespace::*) and name()!='xml'">
<node id="{generate-id()}" type="namespace"
name="{name()}" value="{.}"/>
</xsl:if>
</xsl:for-each>
<xsl:apply-templates mode="idcopy"
select="*|text()|comment()|processing-instruction()"/>
</node>
</xsl:template>
<xsl:template match="comment()" mode="idcopy">
<node id="{generate-id()}" type="comment" value="{.}"/>
</xsl:template>
<xsl:template match="processing-instruction()" mode="idcopy">
<node id="{generate-id()}" type="processing-instruction"
value="{.}"/>
</xsl:template>
<xsl:template match="text()" mode="idcopy">
<node id="{generate-id()}" type="text" value="{.}"/>
</xsl:template>
</xsl:stylesheet>
$
[1] http://stamm-wilbrandt.de/en/xsl-list/idc.xsl
Mit besten Gruessen / Best wishes,
Hermann Stamm-Wilbrandt
Developer, XML Compiler, L3
WebSphere DataPower SOA Appliances
----------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Martin Jetter
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294
From: Michael Kay <mike(_at_)saxonica(_dot_)com>
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Date: 08/24/2010 02:17 PM
Subject: Re: Fw: [xsl] Question on duplicate node elimination
I haven't understood your logic in any detail, but I wonder if it
suggests an alternative approach to the problem: namely, avoid creating
RTFs entirely, at least for intermediate results. Instead, whenever you
are evaluating an operation that returns a node-set, represent that
node-set as a string containing the generate-id values of the nodes in
the node-set, space-separated. Elimination of duplicates then reduces to
an operation on strings: not trivial, but not especially difficult either.
Michael Kay
Saxonica
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--