xsl-list
[Top] [All Lists]

Re: [xsl] Using memory addressing to retrieve a value vice using a software string library to retrieve a value

2015-11-20 10:39:02
In addition to the replies by Dr. Kay and David Carlisle:

It depends on the concrete case and the concrete XML document and the
string itself.

1. Imagine a string starting with a space and then having the wanted
substring (that is not too-long). This may be faster than scanning
sequentially through thousands of nodes -- as in general an XPath
engine may use a linked-list and not a vector.

2. On the other side, imagine an XPath engine that has the sequence of
nodes in a vector (self-expandable array structure). Then the wanted
node can be obtained in O(1). Imagine at the same time that you have a
sting long many Megs, snd the first space happens only at the end of
the string, and the wanted substring is also very long. In this case
searching through the sting using the specified expression can be
orders of magnitude slower.

A general remark about finding the first/all  occurrence of a given
string as a substring of another given string, or whether a string1
contains a string2.

I am aware of at least two very efficient algorithms:

  1. Using *suffix-trees*  -- all possible suffixes of the given
string are organized as a trie.  The search (same as the contains()
function) is very fast.

  2. Using the hash of the search-string and scanning the given string
and sequentially computing the hash of every next substring with the
same length as the search string. Because computing the hash of the
moving "window" can be done incrementally, this algorithm is O(N)  --
much better than the naïve O(N*M) algorithm. More about this -- in the
book of Steven Skiena "The Algorithm Design Manual",
http://www.amazon.com/Algorithm-Design-Manual-Steven-Skiena/dp/1849967202/ref=sr_1_1?s=books&ie=UTF8&qid=1448037467&sr=1-1&keywords=skiena&pebp=1448037471640&perid=0D1Z0T6Y90MCMDPDA330


Cheers,
Dimitre


To summarize:  All depends.



On Fri, Nov 20, 2015 at 4:13 AM, Costello, Roger L. 
costello(_at_)mitre(_dot_)org
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
Hi Folks,

I want to retrieve "west".

Which of these is faster?

-------------------
Approach #1
-------------------
The <edge> element contains text:

        <edge>garden west door</edge>

"west" is retrieved using this XPath:

        substring-before(substring-after(., ' '), ' ')

Note: assume that <edge> is the context node.

-------------------
Approach #2
-------------------
The <edge> element contains markup:

        <edge>
            <garden/>
            <west/>
            <door/>
        </edge>

"west" is retrieved using this XPath:

        *[2]/name()


I believe that Approach #2 is much, much faster. Do you agree?

As I see it, Approach #2 is simply a memory addressing task (which computers 
do very well) to obtain <west/> and then output the symbols at that memory 
address. Approach #1, on the other hand, requires the use of software string 
libraries, which, I imagine, results in hundreds or thousands of machine 
instructions. In fact, I would imagine that Approach #2 is hundreds or 
thousands of times faster than Approach #1. Do you agree?

/Roger




-- 
Cheers,
Dimitre Novatchev
---------------------------------------
Truly great madness cannot be achieved without significant intelligence.
---------------------------------------
To invent, you need a good imagination and a pile of junk
-------------------------------------
Never fight an inanimate object
-------------------------------------
To avoid situations in which you might make mistakes may be the
biggest mistake of all
------------------------------------
Quality means doing it right when no one is looking.
-------------------------------------
You've achieved success in your field when you don't know whether what
you're doing is work or play
-------------------------------------
To achieve the impossible dream, try going to sleep.
-------------------------------------
Facts do not cease to exist because they are ignored.
-------------------------------------
Typing monkeys will write all Shakespeare's works in 200yrs.Will they
write all patents, too? :)
-------------------------------------
Sanity is madness put to good use.
-------------------------------------
I finally figured out the only reason to be alive is to enjoy it.
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>