xsl-list
[Top] [All Lists]

Re: XPath: better way to check for text nodes that aren't descendents of x or y nodes?

2003-05-14 12:07:52
Using the wellknown XPath expression for set difference:

  $ns1 - $ns2 =

   $ns1[not(count(. | $ns2) = count($ns2))]

One would test to see if there is a non-empty set difference between the set
of all text-node descendents and those that are descendents of "llcd:vernac"
or "llcd:gloss" descendents of the current node:

The expression to test is:

   descendent::text()[not(count(. | .//*[self::llcd:vernac or
self::llcd:gloss]//text())
                              =
                                count(.//*[self::llcd:vernac or
self::llcd:gloss]//text()))
                                ]


Or quite more simple:

  count(.//text()) != count(.//*[self::llcd:vernac or
self::llcd:gloss]//text())



My current test is
 test=".//text()[not(ancestor::llcd:vernac | ancestor::llcd:gloss)]"

In the general case this is not correct, because it will permit "illegal"
text-nodes, which have an llcd:vernac or llcd:gloss ancestor, which is not a
descendent of the current node (but its ancestor).

Apart from this observation, a non-clever XSLT processor will build the
union in the predicate and this is quite expensive operation. I think it
would be more efficient to re-write the expression as:

.//text()[not(ancestor::llcd:vernac or ancestor::llcd:gloss)]



=====
Cheers,

Dimitre Novatchev.
http://fxsl.sourceforge.net/ -- the home of FXSL


"Lars Huttar" <lars_huttar(_at_)sil(_dot_)org> wrote in message
news:002c01c31a33$1a688ea0$250414ac(_at_)LarsandKate(_dot_)(_dot_)(_dot_)
Hi all,
My requirement is to check for validity of certain XML data as follows:
all text() nodes descended from . must be descendants of either
llcd:vernac
or llcd:gloss.
(By the way, if it helps, the llcd:vernac or llcd:gloss will be
descendants
of . too, not ancestors.)

My current test is
 test=".//text()[not(ancestor::llcd:vernac | ancestor::llcd:gloss)]"

If this test is true, the data is invalid.

But is there a more efficient way to do this?
Something that checks for llcd:vernac|llcd:gloss along the way,
instead of going down the descendant axis and then back up the
ancestor axis (twice)?  Something along the lines of
  test="./(not(llcd:vernac|llcd:gloss)/)*/text()"
where * means "0 or more times".

I guess I could do
 test="count(.//text()) >
       count(.//llcd:vernac//text() | .//llcd:gloss//text())"
but I'm not sure that's any more efficient.

This is not a big deal, just wanting to be as efficient as reasonably
possible.

Lars


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list






 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list