xsl-list
[Top] [All Lists]

Re: [xsl] Processing mixed content. [Was: Parsing complex line (mixed text and markup)]

2008-02-17 06:38:45
On 16/02/2008, Ilya Lifshits <chehlo(_at_)gmail(_dot_)com> wrote:
I wonder if the Michael first suggestion has disadvantages for your opinion 
and
you are trying to improve, or this is just another possible solution ?
I would think, this solution is more general, but I had hoped to get
Michael to comment on that. Certainly it's easy to implement in XSLT
1.0.
Anyway here is a _corrected_ version of the above, tested with saxon 9.0

<xsl:template match="tbentry">
        <xsl:copy>
                <xsl:apply-templates select="@*"/>
                <xsl:variable name="curr" select="."/>
                <xsl:variable name="temp">
                        <xsl:apply-templates select="node()" mode="text"/>
                </xsl:variable>
                <xsl:for-each select="tokenize($temp, ',')">
                        <entry>
                                <xsl:for-each select="tokenize($temp, '@xy')">
                                        <xsl:choose>
                                                <xsl:when test="starts-with(., 
'xy')">
                                                        <xsl:apply-templates
select="$curr/node()[xs:integer(substring(current(), 3))]"/>
                                                </xsl:when>
                                                <xsl:otherwise>
                                                        <xsl:value-of 
select="."/>
                                                </xsl:otherwise>
                                        </xsl:choose>
                                </xsl:for-each>
                        </entry>
                </xsl:for-each>
        </xsl:copy>
</xsl:template>
<xsl:template match="*" mode="text">
        <xsl:value-of select="concat('@xyxy', position(), '@xy')"/>
</xsl:template>

Manfred

On 16/02/2008, Ilya Lifshits <chehlo(_at_)gmail(_dot_)com> wrote:
While I'm absolutely not capable to comment if this solution is valid,
since i'm completely newbie . I wander if the Michael first suggestion
has disadvantages for your opinion and you are trying to improve, or
this is just another possible solution ?
From my newbie point of view the Michael suggestion is more straight
forward and clear.

Ilya.


On Feb 15, 2008 10:43 PM, Manfred Staudinger
<manfred(_dot_)staudinger(_at_)gmail(_dot_)com> wrote:
Hi All,

I would like to propose a third variant and to get your comments about it.

On 15/02/2008, Michael Kay <mike(_at_)saxonica(_dot_)com> wrote:
On 14/02/2008, Ilya Lifshits <chehlo(_at_)gmail(_dot_)com> wrote:
I'm using xslt 2.0 processor both saxon and and altova.

I'm trying to parse complex line like:
<tbentry>Some text, Some more text <xref linkend="somelink">
even more text , , ,</tbentrys>

and get following output :

<row>
<entry>Some text</entry>
<entry>Some more text <xref
linkend="ut_man_related_docs"> and even more text </entry> </row>

Number of entries is not constant.

I have easily find the solution of this without mixing the
text and markup by using tokenize function.
But failed to separate text and markup using this approach.
Example can be found here : http://pastebin.com/m40fd204f

To formalize the goal: I want to simplify life of our tech
writes by creating wrappers on top of DocBook that will
help transform from my defined syntax to standard Docbook code.
So if there is another more appropriate way (which is not WYSIWYG
editor) to achieve this, i can completely change the source line:
<tblrow>Some text, Some more text <xref linkend="somelink">
even more text </tblrow> as soon as it's still easy to write

This problem has come up in the past and it's not particularly easy. There
seem to be two main approaches:

(a) convert the string delimiters into element markup, and then use 
grouping
facilities (xsl:for-each-group) to analyze the overall structure

(b) convert the markup into string delimiters, and then use
xsl:analyze-string.

Both work, but I think (a) is probably a bit easier.

Do all the delimiters (commas) occur in top-level text nodes, or can they
occur nested within elements? I'll assume the former.

Start by making a copy of the data in which the commas are replaced by
<comma/> elements:

<xsl:template match="tbentry">
<xsl:variable name="temp">
<xsl:apply-templates mode="replace-commas"/>
</xsl:variable>
<xsl:for-each-group select="$temp/child::node()"
group-starting-with="comma">
<entry><xsl:copy-of select="current-group()[not(self::comma)]"/></entry>
<xsl:for-each-group>
</xsl:template>

<xsl:template match="*" mode="replace-commas">
<xsl:copy-of select="."/>
</xsl:template>

<xsl:template match="text()" mode="replace-commas">
<xsl:analyze-string select="." regex=",">
<xsl:matching-substring><comma/></xsl:matching-substring>
<xsl:non-matching-substring><xsl:value-of
select="."/></xsl:non-matching-substring>
</xsl:analyze-string>
</xsl:template>


(c) convert the elements into strings which contain the position()
of the element. After processing the string, reinsert those elements.

Let's assume the document does not contain 'xy'. Then
<xsl:template match="tbentry">
<xsl:variable name="temp">
   <xsl:apply-templates mode="text"/>
</xsl:variable>
<xsl:for-each select="tokenize($temp, ',')">
   <entry>
      <xsl:for-each select="tokenize(., '@xy')">
         <xsl:choose>
            <xsl:when test="starts-with(., 'xy')">
<!-- A -->   <xsl:apply-templates
select="/node()[xs:integer(substring(., 3))]"/>
            </xsl:when>
            <xsl:otherwise>
               <xsl:value-of select="."/>
            </xsl:otherwise>
         </xsl:choose>
      <xsl:for-each>
   </entry>
<xsl:for-each>
</xsl:template>

<xsl:template match="*" mode="text">
        <xsl:value-of select="concat('@xyxy', position(), '@xy')"/>
</xsl:template>
<xsl:template match="text()" mode="text">
        <xsl:value-of select="."/>
</xsl:template>

Not tested and I'm uncertain about (A), but a very similar solution
works fine in XSLT 1.0, where the processing of the string is done by
recursive templates.

Thanks in advance,

Manfred
http://documenta.rudolphina.org/Indices/Index.html

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--