I would tackle this as follows:
Step 1: classify the element. Use xsl:choose and matches() to decide which
of the four categories it belongs to, and copy the element adding an
attribute to indicate the category.
Step 2: do the grouping (concatenation of adjacent elements according to
your rule C). Probably using xsl:for-each-group group-adjacent, but I'm not
entirely clear of the criteria.
Step 3: use analyze-string on the contents of the grouped elements to insert
<ordinal> and <text> element children.
Michael Kay
http://www.saxonica.com/
-----Original Message-----
From: Yves Forkl [mailto:Y(_dot_)Forkl(_at_)srz(_dot_)de]
Sent: 08 February 2007 16:48
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] xsl:analyze-string problem
Hi XSLT 2.0 wizards,
while the syntax and semantics of xsl:analyze-string have
become clear to me, I am now in search of an idiom implying
it which it could help me solve this problem. (Or maybe of an
alternative...)
In the input I find elements like these:
1) <e> def ghi</e>
2) <e> abc 22 def 3 ghi 1. </e>
3) <e> 2. </e>
4) <e> 3. def 35 78 ghi </e>
The possible contents fit into exactly 4 classes:
1) just some words and/or numbers
2) like 1), but followed by a number and a period
3) just a number and a period
4) like 3), but followed by some words and/or numbers
In each case, spaces may or may not appear at beginning and
end of the content and must be preserved (no matter to which
group they get attached).
The problem consists of replacing the original "e" element by
creating new elements according to these rules:
A) A number followed by a period goes into a "ordinal" element.
B) Words and numbers go into a "text" element.
C) In cases 1) and 4), where words and numbers appear at the
end, the content of the current "e" element must be
concatenated with all adjacent "e" elements of type 1) and 2)
before putting it all into the "text" element. By contrast,
in cases 2) and 3) which are ended by a number and a period
the contents of the following "e" instance should never be appended.
My approach is to use the following templates:
<xsl:template match="e">
<xsl:analyze-string select="." regex="^(.*?)( *[0-9]\. *)(.*)$">
<xsl:for-each select="regex-group(1)">
<xsl:call-template name="create_element_and_space">
<xsl:with-param name="new_element_name" select="'text'"/>
</xsl:call-template>
</xsl:for-each>
<xsl:for-each select="regex-group(2)">
<xsl:call-template name="create_element_and_space">
<xsl:with-param name="new_element_name"
select="'ordinal'"/>
</xsl:call-template>
</xsl:for-each>
<xsl:for-each select="regex-group(3)">
<xsl:call-template name="create_element_and_space">
<xsl:with-param name="new_element_name" select="'text'"/>
</xsl:call-template>
</xsl:for-each>
</xsl:matching-substring>
</xsl:analyze-string>
<xsl:apply-templates select="following-sibling::e[1]"/>
</xsl:template>
<!-- helper template for squeezing spaces out into mixed
content --> <xsl:template name="create_element_and_space">
<xsl:param name="new_element_name"/>
<xsl:analyze-string select="." regex="^\s+|\s+$">
<xsl:matching-substring>
<xsl:value-of select="."/>
</xsl:matching-substring>
<xsl:non-matching-substring>
<xsl:element name="{$new_element_name}">
<xsl:value-of select="."/>
</xsl:element>
</xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
What is not clear to me is:
- whether the regex actually suffices to match the rules
- if it is a good idea to use xsl:for-each there
- how to assure concatenation of all the "e" instances'
contents in cases 1) and 4) without processing them
repeatedly - i.e.: how can I restrict the call to
xsl:apply-templates to cases 2) and 3)?
Any comments would be greatly appreciated.
Yves
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--