xsl-list
[Top] [All Lists]

Re: [xsl] XQuery/XPath 3.1: Node List to Node Set ("distinct nodes")

2021-12-28 18:46:28
As for performance, I compared the execution times of the two solutions
(the index-of   vs    fold-left / intersect / if-then-else).

The Xml document was : "<t><a/><b/><c/></t>".
The $nodes sequence contained 45 nodes:
$nodes := ($xml/*/a, $xml/*/c, $xml/*/b, $xml/*/a, $xml/*/b, $xml/*/a,
$xml/*/c, $xml/*/b, $xml/*/a, $xml/*/b, $xml/*/a, $xml/*/c, $xml/*/b,
$xml/*/a, $xml/*/b, $xml/*/a, $xml/*/c, $xml/*/b, $xml/*/a, $xml/*/b,
$xml/*/a, $xml/*/c, $xml/*/b, $xml/*/a, $xml/*/b, $xml/*/a, $xml/*/c,
$xml/*/b, $xml/*/a, $xml/*/b,$xml/*/a, $xml/*/c, $xml/*/b, $xml/*/a,
$xml/*/b, $xml/*/a, $xml/*/c, $xml/*/b, $xml/*/a, $xml/*/b, $xml/*/a,
$xml/*/c, $xml/*/b, $xml/*/a, $xml/*/b )

Separately I timed only the time it takes for executing parse-xml() and
constructing the node sequence. All this was done with BaseX.

Results:
Parsing the Xml document and constructing the sequence:  0.10ms
Evaluating the "short" expression:  0.41ms
Evaluating the "long"  expression:  0.44ms

"short" vs. "long" with the parsing time subtracted:  0.31ms vs. 0.34ms

Thus we see that both expressions have approximately the same efficiency,
though in this concrete measurement the "short" was about 10% faster than
the "long"  (I suspect this difference is not statistically significant).

Cheers,
Dimitre


On Tue, Dec 28, 2021 at 4:23 PM Dimitre Novatchev 
dnovatchev(_at_)gmail(_dot_)com <
xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:



On Tue, Dec 28, 2021 at 4:10 PM Michael Kay mike(_at_)saxonica(_dot_)com <
xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:



On 28 Dec 2021, at 23:54, Dimitre Novatchev 
dnovatchev(_at_)gmail(_dot_)com <
xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:


   $nodes[index-of($nodes ! generate-id(.), generate-id(.))[1]]

This seems a candidate for "the shortest solution" and it shouldn't be
inefficient, given a good optimizer:


It probably also gets a prize for the first practical use case of a
filter expression where the predicate is numeric and has different values
for different nodes in the input sequence.

It's going to be O(n*m) unless index-of() is optimized to use some kind
of index or hash lookup rather than a sequential search. That's assuming
that the expression $nodes ! generate-id(.) gets loop-lifted; if it isn't,
then it becomes O(n*n*m).


Seems BaseX is good enough to do this. I increased the number of nodes in
$nodes 3 times and there was no increase in the evaluation time.



Aesthetically, I find generate-id() ugly and it would be nice to avoid it.


Its name is ugly, yes. A shorter and more meaningful name, like id() or
key() would be much better. Maybe we need a mechanism in XPath 4.0 to
specify global aliases (like a using file... )


Cheers,
Dimitre



Michael Kay
Saxonica




--
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
To achieve the impossible dream, try going to sleep.
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they
write all patents, too? :)
-------------------------------------
Sanity is madness put to good use.
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.

XSL-List info and archive <http://www.mulberrytech.com/xsl/xsl-list>
EasyUnsubscribe <http://lists.mulberrytech.com/unsub/xsl-list/782854> (by
email <>)



-- 
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
To achieve the impossible dream, try going to sleep.
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they write
all patents, too? :)
-------------------------------------
Sanity is madness put to good use.
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--
<Prev in Thread] Current Thread [Next in Thread>