xsl-list
[Top] [All Lists]

Re: match string

2004-10-20 13:17:32
Hi Wendell,

You want either:

(//text())[1]

(collects all the text nodes, returns only the first)

or

/descendant::text()[1]

(returns the first descendant text node).

OK... but now the problem is, none of both seem to be valid in a match pattern.

<xsl:template match="para(//text())[1]"> saxon says: "The only functions allowed in a pattern are id() and key()" <xsl:template match="para/descendant::text()[1]"> saxon says: "Axis in pattern must be child or attribute"

(The first one is strange: is text() really a function? And even then, why is "para//text()[1]" a valid pattern and "para(//text())[1]" isn't?)

So I guess I'd have to use one of them in an apply-templates select attribute (instead of in match) but I'm stuck on how to combine that with the identity template. I could select "para(//text())[1]" but how would I select all the rest then (something like "para(//text())[position() > 1]" won't work).

Input XML:

<section>
   <para>A paragraph without any markup</para>
   <para>   Beware of leading whitespace   </para>
   <para>A paragraph with some <i>markup</i> inside</para>
   <para>A paragraph with some <b><i>nested</i> markup</b></para>
<para><em>This is a special case:</em> paragraph starts with markup</para>
   <para><em>This</em> is difficult: only the first word has markup</para>
</section>

The goal is, to isolate the first 3 words of each paragraph. Desired output:

<section>
   <para><first>A paragraph without </first>any markup</para>
   <para><first>Beware of leading </first>whitespace</para>
   <para><first>A paragraph with </first>some <i>markup</i> inside</para>
<para><first>A paragraph with </first>some <b><i>nested</i> markup</b></para> <para><em><first>This is a </first>special case:</em> paragraph starts with markup</para> <para><em><first>This</first></em> is difficult: only the first word has markup</para>
</section>

The last one is especially difficult, ideally that would be
<para><first><em>This</em> is difficult:</first> only the first word has markup</para>

Stylesheet so far:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
   <xsl:output method="xml" version="1.0" encoding="utf-8" indent="yes"/>
   <xsl:strip-space elements="*"/>

   <xsl:param name="split" select="3"/>

   <!-- identity template: copy all elements -->
   <xsl:template match="*">
       <xsl:copy>
           <xsl:copy-of select="@*"/>
           <xsl:apply-templates/>
       </xsl:copy>
   </xsl:template>

   <xsl:template match="para/text()[1]">  <!--  <  <  <  -->
       <xsl:call-template name="split-words"/>
   </xsl:template>

   <xsl:template name="split-words">
       <xsl:param name="i" select="0"/>
       <xsl:param name="str1" select="''"/>
       <xsl:param name="str2" select="normalize-space(.)"/>
       <xsl:choose>
           <xsl:when test="$i = $split">
               <first><xsl:value-of select="$str1"/></first>
               <xsl:value-of select="$str2"/>
           </xsl:when>
           <xsl:otherwise>
               <xsl:choose>
                   <xsl:when test="contains($str2,' ')">
                       <xsl:call-template name="split-words">
                           <xsl:with-param name="i" select="$i+1"/>
<xsl:with-param name="str1" select="concat($str1,substring-before($str2,' '),' ')"/> <xsl:with-param name="str2" select="substring-after($str2,' ')"/>
                       </xsl:call-template>
                   </xsl:when>
                   <xsl:otherwise>
                       <xsl:call-template name="split-words">
                           <xsl:with-param name="i" select="$split"/>
<xsl:with-param name="str1" select="concat($str1,$str2)"/>
                           <xsl:with-param name="str2" select="''"/>
                       </xsl:call-template>
                   </xsl:otherwise>
               </xsl:choose>
           </xsl:otherwise>
       </xsl:choose>
   </xsl:template>

</xsl:stylesheet>

Output: correct except for the last 2 para's

<section>
  <para><first>A paragraph without </first>any markup</para>
  <para><first>Beware of leading </first>whitespace</para>
  <para><first>A paragraph with </first>some<i>markup</i> inside</para>
<para><first>A paragraph with </first>some<b><i>nested</i> markup</b></para> <para><em>This is a special case:</em><first>paragraph starts with </first>markup</para> <para><em>This</em><first>is difficult: only </first>the first word has markup</para>
</section>

--
Anton




<Prev in Thread] Current Thread [Next in Thread>