On 26/07/16 21:21, Dorothy Hoskins dorothy(_dot_)hoskins(_at_)gmail(_dot_)com
wrote:
HI, in the case of the element A containing multiple sentences (assuming
"." as end of sentence punctuation), is there a reliable way to find the
sentence that surrounds the child element B wherever it occurs in A?
I think that the solution (regex?) will have to look backwards from the
start tag of B and past the end tag of A to the nearest "."
I recognize that if there is some abbreviation or decimal number in the
sentence that will be interpreted as the end of sentence. That's OK as a
limitation.
Very crudely, yes (I have taken the liberty of adding a dot after the
question mark and the quoted dot in your example to make them fit the
pattern of "sentence ends with dot"):
========================== test.xml =================================
<A>HI, in the case of the element A containing multiple sentences
(assuming "." as end of sentence punctuation), is there a reliable
way to find the sentence that surrounds <B>the child element B</B>
wherever it occurs in A?. I think that the solution (regex?) will
have to look backwards from the start tag of <B>B and past the end
tag of A</B> to the nearest ".". I recognize that if there is some
abbreviation or decimal number in the sentence that will be
interpreted as the end of sentence. That's OK as a limitation.</A>
========================== test.xsl ==================================
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="2.0">
<xsl:output method="xml"/>
<xsl:template match="/">
<text>
<xsl:apply-templates/>
</text>
</xsl:template>
<xsl:template match="A">
<xsl:for-each select="B">
<sentence>
<xsl:value-of
select="tokenize(preceding-sibling::text()[1],'\. ')
[position()=last()]"/>
<xsl:value-of select="."/>
<xsl:variable name="posttext"
select="following-sibling::text()[1]"/>
<xsl:value-of
select="tokenize(following-sibling::text()[1],'\. ')[1]"/>
<xsl:text>.</xsl:text>
</sentence>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
============================ output =================================
<?xml version="1.0" encoding="UTF-8"?><text><sentence>HI, in the case of
the element A containing multiple sentences
(assuming "." as end of sentence punctuation), is there a reliable
way to find the sentence that surrounds the child element B
wherever it occurs in A?.</sentence><sentence>I think that the
solution (regex?) will
have to look backwards from the start tag of B and past the end
tag of A to the nearest ".".</sentence></text>
=====================================================================
This will fail on a probably significant number of test cases. Making it
work with sentences ending in question marks, exclamation marks, quoted
dots, etc is left as an exercise...:-)
///Peter
///Peter
--
Peter Flynn | Academic & Collaborative Technologies | University College
Cork IT Services | ☎ +353 21 490 2609 | ✉ pflynn(_at_)ucc(_dot_)ie | 🌍
www.ucc.ie
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--