This problem has come up in the past and it's not particularly easy. There
seem to be two main approaches:
(a) convert the string delimiters into element markup, and then use grouping
facilities (xsl:for-each-group) to analyze the overall structure
(b) convert the markup into string delimiters, and then use
xsl:analyze-string.
Both work, but I think (a) is probably a bit easier.
Do all the delimiters (commas) occur in top-level text nodes, or can they
occur nested within elements? I'll assume the former.
Start by making a copy of the data in which the commas are replaced by
<comma/> elements:
<xsl:template match="tbentry">
<xsl:variable name="temp">
<xsl:apply-templates mode="replace-commas"/>
</xsl:variable>
..[G]..
</xsl:template>
<xsl:template match="*" mode="replace-commas">
<xsl:copy-of select="."/>
</xsl:template>
<xsl:template match="text()" mode="replace-commas">
<xsl:analyze-string select="." regex=",">
<xsl:matching-substring><comma/></xsl:matching-substring>
<xsl:non-matching-substring><xsl:value-of
select="."/></xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>
Then (at [G] above) process the new tbentry using grouping
<xsl:for-each-group select="$temp/child::node()"
group-starting-with="comma">
<entry><xsl:copy-of select="current-group()"/></entry>
<xsl:for-each-group>
Not tested!
Michael Kay
http://www.saxonica.com/
-----Original Message-----
From: Ilya Lifshits [mailto:chehlo(_at_)gmail(_dot_)com]
Sent: 14 February 2008 22:38
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Parsing complex line (mixed text and markup)
Hello experts,
I'm using xslt 2.0 processor both saxon and and altova.
I'm trying to parse complex line like:
<tbentry>Some text, Some more text <xref linkend="somelink">
even more text , , ,</tbentrys>
and get following output :
<row>
<entry>Some text</entry>
<entry>Some more text <xref
linkend="ut_man_related_docs"> and even more text </entry> </row>
Number of entries is not constant.
I have easily find the solution of this without mixing the
text and markup by using tokenize function.
But failed to separate text and markup using this approach.
Example can be found here : http://pastebin.com/m40fd204f
To formalize the goal: I want to simplify life of our tech
writes by creating wrappers on top of DocBook that will
help transform from my defined syntax to standard Docbook code.
So if there is another more appropriate way (which is not WYSIWYG
editor) to achieve this, i can completely change the source line:
<tblrow>Some text, Some more text <xref linkend="somelink">
even more text </tblrow> as soon as it's still easy to write
:) The only solution i found is pass linkend entry as an
attribute to tblrow and another attribute which will specify
the entry number.
But this is very limited solution and will not allow me to
use xref in 2 entries for example.
Additional note, I'm absolutely newby in XML.
Thanks in advance,
Ilya.
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--