xsl-list
[Top] [All Lists]

[xsl] Need an XPath 2.0 expression that identifies a long block of uninterrupted non-blocking space characters in an XHTML document

2019-10-14 12:08:53
Hi Folks,

As you may know, when a formatted email message is created in Outlook, Outlook 
generates HTML under the hood.

I am trying to determine if a formatted email message has text at the bottom of 
the email message that is separated from the rest of the email by a lot of 
space. In other words, the text at the bottom of the underlying HTML is 
preceded by a bunch of non-blocking space characters ( ). 

Assume the HTML has been converted to XHTML.

I need an XPath 2.0 expression that identifies a long block of non-blocking 
space characters.

Outlook generates HTML like that shown below. The non-blocking space character 
is nested inside an <o:p> element, which is nested inside a <p> element. 

I came up with this XPath expression:

//p[o:p eq '&#160;'][count(following-sibling::*[position() le 10][name() eq 
'p'][o:p eq '&#160;']) ge 10][1]

It says, "Give me the first <p> element containing a non-blocking space 
character such that there are at least 10 <p> elements that immediately follow 
it, each containing a non-blocking space character." At least, that's what I 
think it says. Note: 10 is an arbitrary number.

Questions:
1. Do you see any problems with the XPath expression?
2. Is there a better XPath expression?

<html xmlns:o="urn:schemas-microsoft-com:office:office">
    <p class="MsoNormal">top text<o:p/></p>
    <p class="MsoNormal"><o:p>&#160;</o:p></p>
    <p class="MsoNormal"><o:p>&#160;</o:p></p>
    <p class="MsoNormal"><o:p>&#160;</o:p></p>
    <p class="MsoNormal"><o:p>&#160;</o:p></p>
    <p class="MsoNormal"><o:p>&#160;</o:p></p>
    <p class="MsoNormal"><o:p>&#160;</o:p></p>
    <p class="MsoNormal"><o:p>&#160;</o:p></p>
    <p class="MsoNormal"><o:p>&#160;</o:p></p>
    <p class="MsoNormal"><o:p>&#160;</o:p></p>
    <p class="MsoNormal"><o:p>&#160;</o:p></p>
    <p class="MsoNormal"><o:p>&#160;</o:p></p>
    <p class="MsoNormal">bottom text<o:p/></p>
</html>
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>