xsl-list
[Top] [All Lists]

Re: [xsl] Need an XPath 2.0 expression that identifies a long block of uninterrupted non-blocking space characters in an XHTML document

2019-10-14 12:37:05
Well, a predicate using name()='p' is bad news because it depends on namespace 
prefixes, which are arbitrary. Use self::p, assuming it's a no-namespace 
element, or self::xhtml:p if its in the XHTML namespace.

You could also do something like

//p[o:p eq ' '][every $p in following-sibling::*[position() le 10] 
satisfies $p[self::p/o:p eq ' ']]

Michael Kay

On 14 Oct 2019, at 18:09, Costello, Roger L. costello(_at_)mitre(_dot_)org 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

Hi Folks,

As you may know, when a formatted email message is created in Outlook, 
Outlook generates HTML under the hood.

I am trying to determine if a formatted email message has text at the bottom 
of the email message that is separated from the rest of the email by a lot of 
space. In other words, the text at the bottom of the underlying HTML is 
preceded by a bunch of non-blocking space characters (&#160;). 

Assume the HTML has been converted to XHTML.

I need an XPath 2.0 expression that identifies a long block of non-blocking 
space characters.

Outlook generates HTML like that shown below. The non-blocking space 
character is nested inside an <o:p> element, which is nested inside a <p> 
element. 

I came up with this XPath expression:

//p[o:p eq '&#160;'][count(following-sibling::*[position() le 10][name() eq 
'p'][o:p eq '&#160;']) ge 10][1]

It says, "Give me the first <p> element containing a non-blocking space 
character such that there are at least 10 <p> elements that immediately 
follow it, each containing a non-blocking space character." At least, that's 
what I think it says. Note: 10 is an arbitrary number.

Questions:
1. Do you see any problems with the XPath expression?
2. Is there a better XPath expression?

<html xmlns:o="urn:schemas-microsoft-com:office:office">
   <p class="MsoNormal">top text<o:p/></p>
   <p class="MsoNormal"><o:p>&#160;</o:p></p>
   <p class="MsoNormal"><o:p>&#160;</o:p></p>
   <p class="MsoNormal"><o:p>&#160;</o:p></p>
   <p class="MsoNormal"><o:p>&#160;</o:p></p>
   <p class="MsoNormal"><o:p>&#160;</o:p></p>
   <p class="MsoNormal"><o:p>&#160;</o:p></p>
   <p class="MsoNormal"><o:p>&#160;</o:p></p>
   <p class="MsoNormal"><o:p>&#160;</o:p></p>
   <p class="MsoNormal"><o:p>&#160;</o:p></p>
   <p class="MsoNormal"><o:p>&#160;</o:p></p>
   <p class="MsoNormal"><o:p>&#160;</o:p></p>
   <p class="MsoNormal">bottom text<o:p/></p>
</html>

--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--