xsl-list
[Top] [All Lists]

Collapsing run-on tag chains not working in saxon or xalan

2004-11-01 12:05:31
Dear All,

With the following xml and xsl, the Microsoft msxmldom 4 is producing
the expected output, but xalan 2.4, 2.6, and saxon 6.5.3 are not: they
all produce
the same, unexpected output.

The purpose of this code is to collapse run-on chains like
<ilink>foo</link><link id="1234">bar</link> into a single tag
<link>foo bar<id id="1234"/>
</ilink>. The xsl will also collapse run-on chains of b, i, sup, sub,
and similar tags.

Can anyone explain to me whether xalan and saxon just have a bug, and
preferably how to get xalan and/or saxon to transform the way msxml4
does here
(which I believe is correct)?

TMIA,
Richard Bondi


Sample input:

<Chapter>
        <ChapterTitle>The chapter title must be immediately followed by a
section title</ChapterTitle>
        <Body>
                <SectionTitle>The section title</SectionTitle>
                <Title>Internal Links: _ilink</Title>
                <Paragraph>The internal link to Proteins and Membranes, 
optionally
including the cont_id would look like: <ilink id="1234">Proteins and
                Membranes</ilink>. You could also just type <ilink>Proteins and
Membranes</ilink>. Another option is <ilink>CBIO|Proteins and
Membranes</ilink>, or
                even just <ilink id="1234"/>. You can also do <ilink
id="1234">CBIO|Proteins and Membranes</ilink>. Spaces on either side
of a pipe (|) are
                optional.</Paragraph>
                <Paragraph>Feel free to include crazy formatting, as in 
<ilink>CBIO|</ilink>
                        <ilink>
                                <i>Proteins</i>
                        </ilink>
                        <ilink> and Membranes</ilink> or <ilink>
                                <b>
                                        <i>Pr</i>
                                </b>
                        </ilink>
                        <ilink>
                                <sup>
                                        <b>
                                                <i>o</i>
                                        </b>
                                </sup>
                        </ilink>
                        <ilink>
                                <sub>
                                        <b>
                                                <i>t</i>
                                        </b>
                                </sub>
                        </ilink>
                        <ilink>
                                <b>
                                        <i>ei</i>
                                </b>
                        </ilink>
                        <ilink>
                                <b>
                                        <i>
                                                <u>n</u>
                                        </i>
                                </b>
                        </ilink>
                        <ilink>
                                <b>
                                        <i>s</i>
                                </b>
                        </ilink>
                        <ilink id="1234">and Membranes</ilink>. </Paragraph>
        </Body>
</Chapter>


Xsl:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
        <xsl:output encoding="ISO-8859-1"/>
        <xsl:template match="/">
                <xsl:apply-templates/>
        </xsl:template>
        <!-- run of ilinks -->
        <xsl:template match="ilink">
                <xsl:if 
test="not(local-name(preceding-sibling::node()[1])='ilink')">
                        <ilink>
                                <xsl:if 
test="not(name(following-sibling::node()[1])='ilink')"><xsl:copy-of
select="@*"/></xsl:if>
                                <xsl:apply-templates/>
                                <xsl:if 
test="name(following-sibling::node()[1])='ilink'"><xsl:apply-templates
select="following-sibling::node()[1]" mode="following"/></xsl:if>
                        </ilink>
                </xsl:if>
        </xsl:template>
        <xsl:template match="ilink" mode="following" >
                <xsl:apply-templates/>
                <xsl:if test="not(name(following-sibling::node()[1])='ilink') 
and
@*"><id><xsl:copy-of select="@*"/></id></xsl:if>
                <xsl:if 
test="name(following-sibling::node()[1])='ilink'"><xsl:apply-templates
select="following-sibling::node()[1]" mode="following"/></xsl:if>
        </xsl:template>
        <!-- run of formatting tags, eg tags without attributes -->
        <xsl:template match="b | i | sup | sub | u | smallcaps | red" 
priority="2">
                <xsl:variable name="ename" select="name(.)"/>
                <xsl:if 
test="not(local-name(preceding-sibling::node()[1])=string($ename))">
                        <xsl:element name="{$ename}">
                                <xsl:apply-templates/>
                                <xsl:if 
test="name(following-sibling::node()[1])=string($ename)"><xsl:apply-templates
select="following-sibling::node()[1]" mode="following"/>
                                </xsl:if>
                        </xsl:element>
                </xsl:if>
        </xsl:template>
        <xsl:template match="b | i | sup | sub | u | smallcaps | red"
mode="following" >
                <xsl:variable name="ename" select="name(.)"/>
                <xsl:apply-templates/>
                <xsl:if 
test="name(following-sibling::node()[1])=string($ename)"><xsl:apply-templates
select="following-sibling::node()[1]" mode="following"/>
                </xsl:if>
        </xsl:template>
        <xsl:template match="@* | node()">
                <xsl:copy >
                        <xsl:apply-templates select="@*" />
                        <xsl:apply-templates />
                </xsl:copy>
        </xsl:template>
</xsl:stylesheet>


Output using msxml4 (correct output, IMHO):

<Chapter>
        <ChapterTitle>The chapter title must be immediately followed by a
section title</ChapterTitle>
        <Body>
                <SectionTitle>The section title</SectionTitle>
                <Title>Internal Links: _ilink</Title>
                <Paragraph>The internal link to Proteins and Membranes, 
optionally
including the cont_id would look like: <ilink id="1234">Proteins and
                Membranes</ilink>. You could also just type <ilink>Proteins and
Membranes</ilink>. Another option is <ilink>CBIO|Proteins and
Membranes</ilink>, or
                even just <ilink id="1234"/>. You can also do <ilink
id="1234">CBIO|Proteins and Membranes</ilink>. Spaces on either side
of a pipe (|) are
                optional.</Paragraph>
                <Paragraph>Feel free to include crazy formatting, as in
<ilink>CBIO|<i>Proteins</i> and Membranes</ilink> or <ilink>
                                <b>
                                        <i>Pr</i>
                                </b>
                                <sup>
                                        <b>
                                                <i>o</i>
                                        </b>
                                </sup>
                                <sub>
                                        <b>
                                                <i>t</i>
                                        </b>
                                </sub>
                                <b>
                                        <i>ei</i>
                                </b>
                                <b>
                                        <i>
                                                <u>n</u>
                                        </i>
                                </b>
                                <b>
                                        <i>s</i>
                                </b>and Membranes<id id="1234"/>
                        </ilink>. </Paragraph>
        </Body>
</Chapter>


Output of xalan 2.4, 2.6.0, and instant saxon 6.5.3 (appears to do
nothing, actually):

<Chapter>
        <ChapterTitle>The chapter title must be immediately followed by a
section title</ChapterTitle>
        <Body>
                <SectionTitle>The section title</SectionTitle>
                <Title>Internal Links: _ilink</Title>
                <Paragraph>The internal link to Proteins and Membranes, 
optionally
including the cont_id would look like: <ilink id="1234">Proteins and
                Membranes</ilink>. You could also just type <ilink>Proteins and
Membranes</ilink>. Another option is <ilink>CBIO|Proteins and
Membranes</ilink>, or
                even just <ilink id="1234"/>. You can also do <ilink
id="1234">CBIO|Proteins and Membranes</ilink>. Spaces on either side
of a pipe (|) are
                optional.</Paragraph>
                <Paragraph>Feel free to include crazy formatting, as in 
<ilink>CBIO|</ilink>
                        <ilink>
                                <i>Proteins</i>
                        </ilink>
                        <ilink> and Membranes</ilink> or <ilink>
                                <b>
                                        <i>Pr</i>
                                </b>
                        </ilink>
                        <ilink>
                                <sup>
                                        <b>
                                                <i>o</i>
                                        </b>
                                </sup>
                        </ilink>
                        <ilink>
                                <sub>
                                        <b>
                                                <i>t</i>
                                        </b>
                                </sub>
                        </ilink>
                        <ilink>
                                <b>
                                        <i>ei</i>
                                </b>
                        </ilink>
                        <ilink>
                                <b>
                                        <i>
                                                <u>n</u>
                                        </i>
                                </b>
                        </ilink>
                        <ilink>
                                <b>
                                        <i>s</i>
                                </b>
                        </ilink>
                        <ilink id="1234">and Membranes</ilink>. </Paragraph>
        </Body>
</Chapter>