xsl-list
[Top] [All Lists]

Re: [xsl] detect sentence surrounding a tag

2016-07-27 03:58:38
On 26/07/16 21:21, Dorothy Hoskins dorothy(_dot_)hoskins(_at_)gmail(_dot_)com 
wrote:
HI, in the case of the element A containing multiple sentences (assuming
"." as end of sentence punctuation), is there a reliable way to find the
sentence that surrounds the child element B wherever it occurs in A?

I think that the solution (regex?) will have to look backwards from the
start tag of B and past the end tag of A to the nearest "."

I recognize that if there is some abbreviation or decimal number in the
sentence that will be interpreted as the end of sentence. That's OK as a
limitation.

Very crudely, yes (I have taken the liberty of adding a dot after the
question mark and the quoted dot in your example to make them fit the
pattern of "sentence ends with dot"):

========================== test.xml =================================
<A>HI, in the case of the element A containing multiple sentences
  (assuming "." as end of sentence punctuation), is there a reliable
  way to find the sentence that surrounds <B>the child element B</B>
  wherever it occurs in A?. I think that the solution (regex?) will
  have to look backwards from the start tag of <B>B and past the end
    tag of A</B> to the nearest ".". I recognize that if there is some
  abbreviation or decimal number in the sentence that will be
  interpreted as the end of sentence. That's OK as a limitation.</A>
========================== test.xsl ==================================
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
                version="2.0">

  <xsl:output method="xml"/>

  <xsl:template match="/">
    <text>
      <xsl:apply-templates/>
    </text>
  </xsl:template>

  <xsl:template match="A">
    <xsl:for-each select="B">
      <sentence>
        <xsl:value-of
          select="tokenize(preceding-sibling::text()[1],'\. ')
                  [position()=last()]"/>
        <xsl:value-of select="."/>
        <xsl:variable name="posttext"
          select="following-sibling::text()[1]"/>
        <xsl:value-of
          select="tokenize(following-sibling::text()[1],'\. ')[1]"/>
        <xsl:text>.</xsl:text>
      </sentence>
    </xsl:for-each>
  </xsl:template>

</xsl:stylesheet>
============================ output =================================
<?xml version="1.0" encoding="UTF-8"?><text><sentence>HI, in the case of
the element A containing multiple sentences
  (assuming "." as end of sentence punctuation), is there a reliable
  way to find the sentence that surrounds the child element B
  wherever it occurs in A?.</sentence><sentence>I think that the
solution (regex?) will
  have to look backwards from the start tag of B and past the end
    tag of A to the nearest ".".</sentence></text>
=====================================================================

This will fail on a probably significant number of test cases. Making it
work with sentences ending in question marks, exclamation marks, quoted
dots, etc is left as an exercise...:-)

///Peter

///Peter
-- 
Peter Flynn | Academic & Collaborative Technologies | University College
Cork IT Services | ☎ +353 21 490 2609 | ✉ pflynn(_at_)ucc(_dot_)ie | 🌍 
www.ucc.ie
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>