xsl-list
[Top] [All Lists]

RE: [xsl] xsl:analyze-string problem

2007-02-08 10:01:21
I would tackle this as follows:

Step 1: classify the element. Use xsl:choose and matches() to decide which
of the four categories it belongs to, and copy the element adding an
attribute to indicate the category.

Step 2: do the grouping (concatenation of adjacent elements according to
your rule C). Probably using xsl:for-each-group group-adjacent, but I'm not
entirely clear of the criteria.

Step 3: use analyze-string on the contents of the grouped elements to insert
<ordinal> and <text> element children.

Michael Kay
http://www.saxonica.com/ 

-----Original Message-----
From: Yves Forkl [mailto:Y(_dot_)Forkl(_at_)srz(_dot_)de] 
Sent: 08 February 2007 16:48
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] xsl:analyze-string problem

Hi XSLT 2.0 wizards,

while the syntax and semantics of xsl:analyze-string have 
become clear to me, I am now in search of an idiom implying 
it which it could help me solve this problem. (Or maybe of an 
alternative...)

In the input I find elements like these:

1) <e> def ghi</e>
2) <e> abc 22 def 3 ghi 1. </e>
3) <e> 2. </e>
4) <e> 3. def 35 78 ghi </e>

The possible contents fit into exactly 4 classes:

1) just some words and/or numbers
2) like 1), but followed by a number and a period
3) just a number and a period
4) like 3), but followed by some words and/or numbers

In each case, spaces may or may not appear at beginning and 
end of the content and must be preserved (no matter to which 
group they get attached).

The problem consists of replacing the original "e" element by 
creating new elements according to these rules:

A) A number followed by a period goes into a "ordinal" element.
B) Words and numbers go into a "text" element.
C) In cases 1) and 4), where words and numbers appear at the 
end, the content of the current "e" element must be 
concatenated with all adjacent "e" elements of type 1) and 2) 
before putting it all into the "text" element. By contrast, 
in cases 2) and 3) which are ended by a number and a period 
the contents of the following "e" instance should never be appended.

My approach is to use the following templates:

<xsl:template match="e">

   <xsl:analyze-string select="." regex="^(.*?)( *[0-9]\. *)(.*)$">

       <xsl:for-each select="regex-group(1)">
         <xsl:call-template name="create_element_and_space">
           <xsl:with-param name="new_element_name" select="'text'"/>
         </xsl:call-template>
       </xsl:for-each>

       <xsl:for-each select="regex-group(2)">
         <xsl:call-template name="create_element_and_space">
           <xsl:with-param name="new_element_name" 
select="'ordinal'"/>
         </xsl:call-template>
       </xsl:for-each>

       <xsl:for-each select="regex-group(3)">
         <xsl:call-template name="create_element_and_space">
           <xsl:with-param name="new_element_name" select="'text'"/>
         </xsl:call-template>
       </xsl:for-each>

     </xsl:matching-substring>

   </xsl:analyze-string>

   <xsl:apply-templates select="following-sibling::e[1]"/>

</xsl:template>


<!-- helper template for squeezing spaces out into mixed 
content --> <xsl:template name="create_element_and_space">
   <xsl:param name="new_element_name"/>

   <xsl:analyze-string select="." regex="^\s+|\s+$">

     <xsl:matching-substring>
       <xsl:value-of select="."/>
     </xsl:matching-substring>

     <xsl:non-matching-substring>
       <xsl:element name="{$new_element_name}">
         <xsl:value-of select="."/>
       </xsl:element>
     </xsl:non-matching-substring>

   </xsl:analyze-string>

</xsl:template>


What is not clear to me is:

- whether the regex actually suffices to match the rules

- if it is a good idea to use xsl:for-each there

- how to assure concatenation of all the "e" instances' 
contents in cases 1) and 4) without processing them 
repeatedly - i.e.: how can I restrict the call to 
xsl:apply-templates to cases 2) and 3)?

Any comments would be greatly appreciated.

   Yves

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>