xsl-list
[Top] [All Lists]

RE: [xsl] Parsing complex line (mixed text and markup)

2008-02-14 16:15:01

This problem has come up in the past and it's not particularly easy. There
seem to be two main approaches:

(a) convert the string delimiters into element markup, and then use grouping
facilities (xsl:for-each-group) to analyze the overall structure

(b) convert the markup into string delimiters, and then use
xsl:analyze-string.

Both work, but I think (a) is probably a bit easier. 

Do all the delimiters (commas) occur in top-level text nodes, or can they
occur nested within elements? I'll assume the former.

Start by making a copy of the data in which the commas are replaced by
<comma/> elements:

<xsl:template match="tbentry">
  <xsl:variable name="temp">
    <xsl:apply-templates mode="replace-commas"/>
  </xsl:variable>
  ..[G]..
</xsl:template>

<xsl:template match="*" mode="replace-commas">
  <xsl:copy-of select="."/>
</xsl:template>

<xsl:template match="text()" mode="replace-commas">
  <xsl:analyze-string select="." regex=",">
    <xsl:matching-substring><comma/></xsl:matching-substring>
    <xsl:non-matching-substring><xsl:value-of
select="."/></xsl:non-matching-substring>
  </xsl:analyze-string>
</xsl:template>

Then (at [G] above) process the new tbentry using grouping

  <xsl:for-each-group select="$temp/child::node()"
group-starting-with="comma">
    <entry><xsl:copy-of select="current-group()"/></entry>
  <xsl:for-each-group>

Not tested!

Michael Kay
http://www.saxonica.com/

-----Original Message-----
From: Ilya Lifshits [mailto:chehlo(_at_)gmail(_dot_)com] 
Sent: 14 February 2008 22:38
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] Parsing complex line (mixed text and markup)

Hello experts,

I'm using  xslt 2.0 processor  both saxon and and altova.

I'm trying to parse complex line like:
<tbentry>Some text, Some more text <xref linkend="somelink">  
even more text , , ,</tbentrys>

and get following output :

<row>
        <entry>Some text</entry>
        <entry>Some more text <xref 
linkend="ut_man_related_docs"> and even more text </entry> </row>

Number of entries is not constant.

I have easily find the solution of this without mixing the 
text and markup by using tokenize function.
But failed to separate text and markup using this approach.
Example can be found here : http://pastebin.com/m40fd204f

To formalize the goal: I want to simplify life of our tech 
writes  by creating wrappers on  top of DocBook that will 
help transform from my defined syntax to standard Docbook code.
So if there is another more appropriate way (which is not WYSIWYG
editor) to achieve this, i can completely change the source line:
 <tblrow>Some text, Some more text <xref linkend="somelink">  
even more text </tblrow> as soon as  it's still easy to write 
:) The only solution i found is pass linkend entry as an 
attribute to tblrow and another attribute which will specify  
the entry  number.
But this is very limited solution and will not allow me to 
use  xref in  2 entries for example.
Additional note, I'm absolutely newby in XML.

Thanks in advance,
 Ilya.

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>