xsl-list
[Top] [All Lists]

Re: [xsl] Function for determining one XPath as subset of another

2016-01-26 15:35:30
On Tue, 2016-01-26 at 16:15 +0000, Adam Retter
adam(_dot_)retter(_at_)googlemail(_dot_)com wrote:
Given two simple XPaths, say:

1. //w

2. /x/y/z/w[@a = 'v']
 [...]
I wondered if there might be an algorithm or library that someone
already had or has written which might be able to give me the answer?

There's quite a bit of literature on optimizing XPath in the context of
databases (e.g. in proc. VLDB, in "XML-Based Data Management and
Multimedia Engineering" and elsewhere) but I don't know of a single
coherent summary.

This may be at least in part because what's useful and practical
depends on the implementation and the situation.

For example, a system with an element occurrence index might solve //w
in O(n_w) time, where n_w is the number of occurrences of w elements in
the corpus being searched; one using a tree structure might do better
with the explicit path.

A path like //a//b//c//d is much harder using tree navigation, but
rewritten as d[ancestor::c[ancestor::b[ancestor::a]]] against an
element index will likely run in time close to O(n_d log D) where D is
the average depth of the tree. A system smart enough to notice that
there are only three "c" elements in the corpus could do better than
O(n_d) though (my text retrieval system did/does that to speed up
multi-word phrase searches, for example).

We encountered each of these implementation strategies during the
development of XQuery.  The cost of building an element index for a
non-database-backed XPath implementation is in some systems amortized
by doing it during parsing, while the input XML is being read; systems
that run XPath against a stored or in-memory tree might not have that
luxury, or might have the even greater luxury of a precomputed index.

I don't know of anything off the shelf, and although open source XQuery
implementations might be a place to look, it's likely that
optimizations are done on an internal representation of the XPath
expression.


I realise that I can only probably cover a subset of XPath itself,
but
it is only the path steps with predicates which I am interested in.

Ideally I am looking for something in Java.

--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>