At 2007-10-02 17:05 +0900, Christian Wittern wrote:
In trying to solve the following problem I am seeking your help:
I want to segment paragraphs in a text, so that sentences are
enclosed in a <s> element and within the sentences, words between
interpunction are within <seg> elements.
So far, I have been capturing the content of <p> in a string and
then using two nested <xsl:analyze-string> blocks with regexes,
which work nicely and do what I want. Now I discovered that there
are <note> elements with additional markup in some paragraphs, which
get lost in this process. However, I really want to leave these
notes alone, as they are. So:
<p>Some text. Some more text, with a comma. <note>This stuff, how
boring</note></p>
should look like:
<p><s><seg>Some text.</seg></s><s><seg>Some more text,</seg><seg>
with a comma.</seg></s><note>This stuff, how boring</note></p>
I wonder how I tell the processor to leave the note stuff alone?
From your comment "capturing the content in a string and then..."
I'm assuming you have something like:
<xsl:template match="p">
<xsl:analyze-string select="." .....
</xsl:template>
If you break this into pieces you can work on each text bit in turn:
<xsl:template match="p">
<xsl:apply-templates mode="in-p" select="node()"/>
</xsl:template>
<xsl:template mode="in-p" match="*">
<xsl:apply-templates select="."/> <!--reapply in the default mode-->
</xsl:template>
<xsl:template mode="in-p" match="text()">
<xsl:analyze-string select="." .....
I hope this helps.
. . . . . . . . . . . . Ken
--
Upcoming public training: UBL and code lists Oct 1/5; Madrid Spain
World-wide corporate, govt. & user group XML, XSL and UBL training
RSS feeds: publicly-available developer resources and training
G. Ken Holman mailto:gkholman(_at_)CraneSoftwrights(_dot_)com
Crane Softwrights Ltd. http://www.CraneSoftwrights.com/s/
Box 266, Kars, Ontario CANADA K0A-2E0 +1(613)489-0999 (F:-0995)
Male Cancer Awareness Jul'07 http://www.CraneSoftwrights.com/s/bc
Legal business disclaimers: http://www.CraneSoftwrights.com/legal
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--