Hi David,
Yes, there shouldn't be any cross-paragraph elements.
Rick
-----Original Message-----
From: David Carlisle d(_dot_)p(_dot_)carlisle(_at_)gmail(_dot_)com
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com>
Sent: Sunday, November 24, 2019 9:33 AM
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: Re: [xsl] Splitting a paragraph into sentences and keep markup
can we assume the easy case (as in your example) where all the sentences end at
the top level?
a more challenging example is
<root>
<p>This has one <span class="zzz">sentence? Actually, it has
<emphasis>two</emphasis>. No,</span> it has three.</p> </root>
as then you need to force-close any open elements at the sentence end and
re-open them in the new sentence so something like
<p>This has one <span class="zzz">sentence?</span></p>
<p><span class="zzz">Actually, it has <emphasis>two</emphasis>.</span></p>
<p><span class="zzz">No,</span> it has three.</p>
David
On Sun, 24 Nov 2019 at 13:34, Rick Quatro rick(_at_)rickquatro(_dot_)com
<xsl-list-service(_at_)lists(_dot_)mulberrytech(_dot_)com> wrote:
Hi All,
I have a situation where I want to split a short paragraph into sentences and
use them in different parts of my output. I am using <xsl:analyze-string>
because I want to account for a sentence ending with a . or ?. This will work
except if there are any children of the paragaph, like the <emphasis> in the
second sentence. Can I split a paragraph into sentences and still keep the
markup?
Here is my input document:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<p>This has one sentence? Actually, it has
<emphasis>two</emphasis>. No, it has three.</p>
</root>
My stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:rq="http://www.frameexpert.com"
exclude-result-prefixes="xs rq"
version="2.0">
<xsl:output indent="yes"/>
<xsl:strip-space elements="root"/>
<xsl:template match="/root">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="p">
<xsl:variable name="sentences"
select="rq:splitParagraphIntoSentences(.)"/>
<p><xsl:value-of select="$sentences[1]"/></p>
<note>Something in between.</note>
<p><xsl:value-of select="$sentences[position()>1]"/></p>
</xsl:template>
<xsl:function name="rq:splitParagraphIntoSentences">
<xsl:param name="paragraph"/>
<xsl:analyze-string select="$paragraph"
regex=".+?[\.\?](\s+|$)">
<xsl:matching-substring>
<sentence><xsl:value-of
select="replace(.,'\s+$','')"/></sentence>
</xsl:matching-substring>
</xsl:analyze-string>
</xsl:function>
</xsl:stylesheet>
My output:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<p>This has one sentence?</p>
<note>Something in between.</note>
<p>Actually, it has two. No, it has three.</p>
</root>
What I want is this:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<p>This has one sentence? </p>
<note>Something in between.</note>
<p>Actually, it has <emphasis>two</emphasis>. No, it has three.
</p>
</root>
Any suggestions will be appreciated.
Rick
XSL-List info and archive
EasyUnsubscribe (by email)
--~----------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
EasyUnsubscribe: http://lists.mulberrytech.com/unsub/xsl-list/1167547
or by email: xsl-list-unsub(_at_)lists(_dot_)mulberrytech(_dot_)com
--~--