Hi Wendell,
You want either:
(//text())[1]
(collects all the text nodes, returns only the first)
or
/descendant::text()[1]
(returns the first descendant text node).
OK... but now the problem is, none of both seem to be valid in a match
pattern.
<xsl:template match="para(//text())[1]"> saxon says: "The only
functions allowed in a pattern are id() and key()"
<xsl:template match="para/descendant::text()[1]"> saxon says: "Axis
in pattern must be child or attribute"
(The first one is strange: is text() really a function? And even then,
why is "para//text()[1]" a valid pattern and "para(//text())[1]" isn't?)
So I guess I'd have to use one of them in an apply-templates select
attribute (instead of in match) but I'm stuck on how to combine that
with the identity template. I could select "para(//text())[1]" but how
would I select all the rest then (something like
"para(//text())[position() > 1]" won't work).
Input XML:
<section>
<para>A paragraph without any markup</para>
<para> Beware of leading whitespace </para>
<para>A paragraph with some <i>markup</i> inside</para>
<para>A paragraph with some <b><i>nested</i> markup</b></para>
<para><em>This is a special case:</em> paragraph starts with
markup</para>
<para><em>This</em> is difficult: only the first word has markup</para>
</section>
The goal is, to isolate the first 3 words of each paragraph. Desired
output:
<section>
<para><first>A paragraph without </first>any markup</para>
<para><first>Beware of leading </first>whitespace</para>
<para><first>A paragraph with </first>some <i>markup</i> inside</para>
<para><first>A paragraph with </first>some <b><i>nested</i>
markup</b></para>
<para><em><first>This is a </first>special case:</em> paragraph
starts with markup</para>
<para><em><first>This</first></em> is difficult: only the first word
has markup</para>
</section>
The last one is especially difficult, ideally that would be
<para><first><em>This</em> is difficult:</first> only the first word
has markup</para>
Stylesheet so far:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:param name="split" select="3"/>
<!-- identity template: copy all elements -->
<xsl:template match="*">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="para/text()[1]"> <!-- < < < -->
<xsl:call-template name="split-words"/>
</xsl:template>
<xsl:template name="split-words">
<xsl:param name="i" select="0"/>
<xsl:param name="str1" select="''"/>
<xsl:param name="str2" select="normalize-space(.)"/>
<xsl:choose>
<xsl:when test="$i = $split">
<first><xsl:value-of select="$str1"/></first>
<xsl:value-of select="$str2"/>
</xsl:when>
<xsl:otherwise>
<xsl:choose>
<xsl:when test="contains($str2,' ')">
<xsl:call-template name="split-words">
<xsl:with-param name="i" select="$i+1"/>
<xsl:with-param name="str1"
select="concat($str1,substring-before($str2,' '),' ')"/>
<xsl:with-param name="str2"
select="substring-after($str2,' ')"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:call-template name="split-words">
<xsl:with-param name="i" select="$split"/>
<xsl:with-param name="str1"
select="concat($str1,$str2)"/>
<xsl:with-param name="str2" select="''"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Output: correct except for the last 2 para's
<section>
<para><first>A paragraph without </first>any markup</para>
<para><first>Beware of leading </first>whitespace</para>
<para><first>A paragraph with </first>some<i>markup</i> inside</para>
<para><first>A paragraph with </first>some<b><i>nested</i>
markup</b></para>
<para><em>This is a special case:</em><first>paragraph starts with
</first>markup</para>
<para><em>This</em><first>is difficult: only </first>the first word
has markup</para>
</section>
--
Anton