xsl-list
[Top] [All Lists]

Re: [xsl] Comparing documents: what of P is a subset of D?

2014-02-27 08:32:47
I'm not sure I've completely understood your "equality" relation that underpins 
the intersection. Perhaps it's based on equality of the function

string-join(ancestor-or-self::*/@_ix, '|')

let's call this function $f, and we can use this as a parameter to the rest of 
the solution.

we then need to do

doc('d.xml')//fc[some $e in doc('p.xml') satisfies $f($e) eq $f(.)] ! path(.)

where path(.) is a function you can write to display the path to the selected 
fc element.

The only remaining problem is that this is O(n*m) where n and m are the sizes 
of D and P. For a more efficient solution, define a key on P.XML that indexes 
each element on the value of the function $f, and replace the predicate by a 
call on key().

The above uses XPath 3.0, but it can probably be expressed in XPath 2.0 easily 
enough at the cost of hard-coding the equality function.

Michael Kay
Saxonica


On 27 Feb 2014, at 10:25, Wolfgang Laun 
<wolfgang(_dot_)laun(_at_)gmail(_dot_)com> wrote:

<cca><!-- a D XML -->
 <rela _ix='0' fa='0' fb='1'>
    <fc _ix='1' fc_fa='X1' fc_fb='1'/>
    <fc _ix='2' fc_fa='X2' fc_fb='2'/>
 </rela>
 <rela _ix='1' fa='10' fb='11'>
    <fc _ix='1' fc_fa='Y1' fc_fb='11'/>
    <fc _ix='2' fc_fa='Y2' fc_fb='12'/>
 </rela>
 <rela _ix='5' fa='50' fb='51'>
    <fc _ix='1' fc_fa='A1' fc_fb='51'/>
    <fc _ix='2' fc_fa='A2' fc_fb='52'/>
 </rela>
 <relb>...</relb>
 <relc>...</relc>
</cca>

<cca><!-- a P XML -->
 <rela _ix='1' fa='10'>
    <fc _ix='1' fc_fa='Y1' fc_fb='99'/>
 </rela>
<rela _ix='5' fa='50' fb='51'>
    <fc _ix='1'                 fc_fb='51' fc_fc='123'/>
    <fc _ix='2' fc_fa='A2' fc_fb='52' fc_fc='456'/>
 </rela>
</cca>

Expected output:

/cca/rela(1)/fa   10
/cca/rela(1)/fc(1)/fc_fa   Y1
/cca/rela(5)/fa   50
/cca/rela(5)/fa   51
/cca/rela(5)/fc(1)/fc_fb   51
/cca/rela(5)/fc(2)/fc_fa   A2
/cca/rela(5)/fc(2)/fc_fb   52

Note that parentheses enclose values of @_ix.

-W

On 27/02/2014, Michael Kay <mike(_at_)saxonica(_dot_)com> wrote:
It would be easier to understand the problem with some example data.

Michael Kay
Saxonica

On 27 Feb 2014, at 08:05, Wolfgang Laun 
<wolfgang(_dot_)laun(_at_)gmail(_dot_)com> wrote:

The data model for a set of similarly (but not identically) built XML
documents is: a collection of arrays of records, which may contain
(recursively) arrays, records and scalars. (The terms "array" and
"record" are used in their "classic" meaning as, e.g., in Pascal.)
Document structures are fairly stable, but they do change over time.
Array elements are identified (indexed) by @_ix, not by position.
Record fields can be elements or attributes (when they are scalar).
Order is undefined, since XPaths plus @_Ix's pinpoint each node.

One XML document D contains a full population for such a data set
(O(1MB)). A second XML document P contains "patches", i.e., each node
appearing in P is expected to be in D as well.

If S(P) is the sequence of nodes (annotated with their XPaths) in P
and S(D) the one with nodes from D, how can I determine S(P) intersect
S(D) (except all @_ix, whose values are bound to be identical)? Of
course, I don't want the common set of *data items* - I want the XML
paths of those common data items.

A solution (in XSLT 2.0) should not need individual adaption for each
kind of data set.

I'm confident that I can create text files for D and P containing one
line <path> <value> for each node and run diff (after sort).

Any better ideas?

Cheers
Wolfgang

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--