xsl-list
[Top] [All Lists]

Re: [xsl] Find the number of elements that are prior to the series of elements that match a string?

2019-03-12 16:46:32
On Tue, Mar 12, 2019 at 1:31 PM Michael Kay mike(_at_)saxonica(_dot_)com
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

In performance terms, of course, a solution using keys incurs a one-time cost 
to build indexes, which is amortized over a number of searches; so the 
performance benefits depend on whether the search is done once, or repeatedly.

If there are "hundreds of thousands" of Byte elements, the space used by the 
index can become a significant factor.

Also, a solution using keys will almost inevitably process the whole input 
sequence. Other solutions are likely to stop reading the input sequence when 
the first match is found.

Wouldn't "hundreds of thousands" of Byte elements be also a (both
performance and) space problem when using index-of() -- especially if
a starting subsequence of the searched string happens regularly in the
document? Also RegEx matching becomes not so efficient with large
strings, too.

It would be nice if a particular XSLT processor implements lazily the
index-of() function, allowing its partial results to be immediately
accessible to the evaluator of the whole XPath expression -- in this
case only, the solution using index-of() will stop immediately after
the successful index value is produced -- otherwise the evaluator will
need to perform on the complete index.

Of course, exactly the same reasoning applies to the implementation of
<xsl:key> and the key() function, if a smart enough XSLT processor
allows the key() function to operate on a partially-built index, and
when a solution is found the building of the index is halted.

So, **depending on implementation details**, the other solutions
**may** have a better average performance for a single search. With
multiple searches though, using keys will be more performant. Also,
depending on where the sought substring is located for the first time
in the document, if it isn't found at all or is at the end of the
sequence of Byte elements, then exhaustive string scanning will be
performed with these other solutions too.

If we accept that an "average" search will involve scanning half of
the string, then if 3 or more searches are performed, the key-based
solution will have incurred the cost of processing the document just
once, while the accumulated string scanning of the other solutions
will be over greater (accumulated) length of sequences of Byte
elements.

Maybe we need a performance comparison of the different solutions,
over different source XML documents that cover these cases?

Cheers,
Dimitre

Michael Kay
Saxonica

On 12 Mar 2019, at 20:20, Dimitre Novatchev dnovatchev(_at_)gmail(_dot_)com 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

Here is an XSLT 2.0 solution using keys. In its current form it
returns the wanted offset (length) as a hexadecimal number. To convert
this to a decimal number, have a look at the archives where AFAIR
there is such code -- could be even posted by me ... This conversion
to decimal can certainly be written elegantly as fold-left() in
XSLT/XPath 3


XSL-List info and archive
EasyUnsubscribe (by email)



-- 
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
To achieve the impossible dream, try going to sleep.
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they
write all patents, too? :)
-------------------------------------
Sanity is madness put to good use.
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>