xsl-list
[Top] [All Lists]

Re: [xsl] detect sentence surrounding a tag

2016-07-27 08:19:48
Dorothy,
This will do it and you can clean out the start and end tags of the text. <?xml 
version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; 
xmlns:xs="http://www.w3.org/2001/XMLSchema"; exclude-result-prefixes="xs" 
version="2.0">
    <!-- turn text nodes into elements with and without sentence endings and 
save in a variable-->
    <xsl:template match="root">
        <xsl:variable name="stage-1">
            <xsl:copy>
                <xsl:apply-templates/>
            </xsl:copy>
        </xsl:variable>
        <!-- see variable -->
        <xsl:result-document href="output-01.xml">
            <xsl:copy-of select="$stage-1"/>
        </xsl:result-document>
        <!-- create final output with a grouping by start text - this assumes B 
is embedded not at start or end -->
        <xsl:result-document href="output-02.xml">
            <root>
                <xsl:for-each-group select="$stage-1/root/node()" 
group-starting-with="start">
                    <sentence>
                        <xsl:copy-of select="current-group()"/>
                    </sentence>
                </xsl:for-each-group>
            </root>
        </xsl:result-document>
    </xsl:template>
    <!-- pass through B -->
    <xsl:template match="B">
        <xsl:copy-of select="."/>
    </xsl:template>
    <!-- determin what kind of text with regex -->
    <xsl:template match="text()">
<!-- assumes a space follows each end of sentence marker -->
        <xsl:analyze-string select="." regex="(.*)(\. |\? )">
            <xsl:matching-substring>
                <end>
                    <xsl:copy-of select="."/>
                </end>
            </xsl:matching-substring>
            <xsl:non-matching-substring>
                <start>
                    <xsl:copy-of select="."/>
                </start>
            </xsl:non-matching-substring>
        </xsl:analyze-string>
    </xsl:template>
</xsl:stylesheet>

Terry


On Tuesday, July 26, 2016 4:37 PM, "Michael Kay mike(_at_)saxonica(_dot_)com" 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:



I don't think there's a "reliable" way to recognize sentences in English text, 
but let's not go there... Not today. 

Generally I think there are two approaches:

(a) convert the markup (start and end of B) to text delimiters and then use 
regular expressions.

(b) convert the text delimiters (full stops and other punctuation) to markup 
(empty milestone tags?) and then use XSLT positional grouping or sibling 
recursion.

Neither is easy enough for me to attempt without a spare half-an-hour to devote 
to it.

Michael Kay
Saxonica 


On 26 Jul 2016, at 21:21, Dorothy Hoskins 
dorothy(_dot_)hoskins(_at_)gmail(_dot_)com 
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:

HI, in the case of the element A containing multiple sentences (assuming "." 
as end of sentence punctuation), is there a reliable way to find the sentence 
that surrounds the child element B wherever it occurs in A?

I think that the solution (regex?) will have to look backwards from the start 
tag of B and past the end tag of A to the nearest "."

I recognize that if there is some abbreviation or decimal number in the 
sentence that will be interpreted as the end of sentence. That's OK as a 
limitation.

Thanks for your help.
- Dorothy

XSL-List info and archive 
EasyUnsubscribe (by email) 

XSL-List info and archive 
EasyUnsubscribe (by email) 
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--

<Prev in Thread] Current Thread [Next in Thread>