xsl-list
[Top] [All Lists]

Re: [xsl] [Summary] Three ways to express in XPath that there are no duplicates in a list of items

2012-11-02 08:01:38
count(Websites/*) = count(distinct-values(Websites/*))

A more efficient version (for an XPath processor with weak optimizer)
of this is:

not(Websites/*[count(distinct-values(Websites/*))+1])

So, we only count up to:
     count(distinct-values(Websites/*))+1

and don't need to count all children of Websites and then compare them
to the count of distinct values.

This is the same as Dirichlet's principle (aka "pigeonhole principle"):

If n items are put into m pigeonholes with n > m, then at least one
pigeonhole must contain more than one item.


Cheers,
Dimitre

On Fri, Nov 2, 2012 at 2:33 AM, Liam R E Quin <liam(_at_)w3(_dot_)org> wrote:
On Fri, 2012-11-02 at 09:15 +0000, Costello, Roger L. wrote:
[...]
Here's how to implement it in XPath 1.0 and in XPath 2.0.

XPath 1.0:

      not(Websites/*[. = preceding-sibling::*])

XPath 2.0:

      empty(Websites/*[index-of(../*,.)[2]])
      count(Websites/*) = count(distinct-values(Websites/*))

The preferred XPath is the last one because it has the best
performance. The first two take on the order of n-squared time (where
n is the number of websites in the list) whereas the last XPath
expression takes on the order of n log n time.

Some brief comments on this...
(1) the XPath 1 solution also works in later versions of XPath;
(2) this only works to test simple text content
(3) an XPath 2 (or later) processor can rewrite the first or second
expression if it likes (and is smart enough) into the third
(4) the last expression can be implemented in linear time, not n log n,
because it's not not necessary to sort values to see if they are
distinct (e.g. the implementation can use a hash table), although this
does use more memory. But for simple content the hash table approach
doesn't even need to use more memory, can just point into the document
tree.

So statements about performance need to be with respect to a particular
version of a specific product, in a specific environment.

And,
(5) I'm sure these aren't the only ways to see if all items are
distinct :-)

You can start getting pretty fancy,
some $w in Websites/Website satisfies count(Websites/Website[. eq $w])
gt 1,
or flwor expressions in XQuery,
or use a key or (XPath in XSLT 3) a map, or.... :)

Liam


--
Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/
Pictures from old books: http://fromoldbooks.org/
Ankh: irc.sorcery.net irc.gnome.org freenode/#xml


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--




-- 
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they
write all patents, too? :)
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--