(i.e. introducing an extra character between attribute
name and value, which is unlikely to occur in the
attribute value; for e.g. a newline character)
how do you define unlikely? I can easily provide a counter example.
(although actually adding such a separator works even if
the separator is in the attribute value, as it uniquely terminates teh
name in the string, you only need to use a character that is not a name
character.
I mentioned attributes but you do the same for elements so you need the
same fix there (with a different character) as you otherwise don't
distringuish element nodes from attribute nodes of the same name.
I also notice that you don't record which element an attribute is on, so
looking at your proposed fix
<xsl:for-each select="$doc1//@*">
<xsl:value-of select="name()"
/><xsl:text>
</xsl:text><xsl:value-of select="."
/>
</xsl:for-each>
<x a="2">
<b/>
</x>
and
<x>
<b a="2"/>
</x>
would both generate the same attribute test string of
"a
2"
so would compare equal.
These documents are reported not equal!
are you sure?
I think here I am right!
hmm:-)
For this example, the $doc1//node() path
expression returns 4 nodes (2 element nodes and 2
"white space text nodes")
yes
The "white space text
nodes" will be filtered by the predicate
[not(normalize-space(self::text()) = '')]
yes but also any element node will be filtered as self::text() on an
element node will return an empty node set (as it isn't a text node)
and normalize-space() on that will return ''
so the whole select expression on the for-each returns an empty node
set.
I agree that the XML parser is not expected to report
attribute nodes in same order. But I guess we can
reasonably assume that a "specific XML parser" would
report attributes in same order.
more guesses.
I have tested the same example with a single product
multiple times, and always I am getting same result..
probably true, but you never really know. attributes are often put into
some kind of hashed data structure so the order they come out can depend
on all sorts of strange factors.
These things can be fixed by (eg) sorting attribute nodes to be
alphabetical) but as Michael just indicated the process is always likely
to be very inefficient. You _always_ generate a really huge string for
each document even if the top level nodes are
<foo version="1"> and <foo version="2">
you'd really like to stop there and not generate a text string of the
100001 child nodes below foo.
Given that you are walking over the trees anyway to generate the
strings, you should be able to walk over th etwo trees in parallel and
stop whenever you find a difference.
David
See what saxon says:
$ saxon eq.xsl eq.xsl iws=y
Equal
$ cat file1.xml
<a>
<b/>
</a>
$ cat file2.xml
<x/>
so when ignoring white space text nodes the stylesheet reports
<a>
<b/>
</a>
as equal to
<x/>
________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--