Dear All,
With the following xml and xsl, the Microsoft msxmldom 4 is producing
the expected output, but xalan 2.4, 2.6, and saxon 6.5.3 are not: they
all produce
the same, unexpected output.
The purpose of this code is to collapse run-on chains like
<ilink>foo</link><link id="1234">bar</link> into a single tag
<link>foo bar<id id="1234"/>
</ilink>. The xsl will also collapse run-on chains of b, i, sup, sub,
and similar tags.
Can anyone explain to me whether xalan and saxon just have a bug, and
preferably how to get xalan and/or saxon to transform the way msxml4
does here
(which I believe is correct)?
TMIA,
Richard Bondi
Sample input:
<Chapter>
<ChapterTitle>The chapter title must be immediately followed by a
section title</ChapterTitle>
<Body>
<SectionTitle>The section title</SectionTitle>
<Title>Internal Links: _ilink</Title>
<Paragraph>The internal link to Proteins and Membranes,
optionally
including the cont_id would look like: <ilink id="1234">Proteins and
Membranes</ilink>. You could also just type <ilink>Proteins and
Membranes</ilink>. Another option is <ilink>CBIO|Proteins and
Membranes</ilink>, or
even just <ilink id="1234"/>. You can also do <ilink
id="1234">CBIO|Proteins and Membranes</ilink>. Spaces on either side
of a pipe (|) are
optional.</Paragraph>
<Paragraph>Feel free to include crazy formatting, as in
<ilink>CBIO|</ilink>
<ilink>
<i>Proteins</i>
</ilink>
<ilink> and Membranes</ilink> or <ilink>
<b>
<i>Pr</i>
</b>
</ilink>
<ilink>
<sup>
<b>
<i>o</i>
</b>
</sup>
</ilink>
<ilink>
<sub>
<b>
<i>t</i>
</b>
</sub>
</ilink>
<ilink>
<b>
<i>ei</i>
</b>
</ilink>
<ilink>
<b>
<i>
<u>n</u>
</i>
</b>
</ilink>
<ilink>
<b>
<i>s</i>
</b>
</ilink>
<ilink id="1234">and Membranes</ilink>. </Paragraph>
</Body>
</Chapter>
Xsl:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output encoding="ISO-8859-1"/>
<xsl:template match="/">
<xsl:apply-templates/>
</xsl:template>
<!-- run of ilinks -->
<xsl:template match="ilink">
<xsl:if
test="not(local-name(preceding-sibling::node()[1])='ilink')">
<ilink>
<xsl:if
test="not(name(following-sibling::node()[1])='ilink')"><xsl:copy-of
select="@*"/></xsl:if>
<xsl:apply-templates/>
<xsl:if
test="name(following-sibling::node()[1])='ilink'"><xsl:apply-templates
select="following-sibling::node()[1]" mode="following"/></xsl:if>
</ilink>
</xsl:if>
</xsl:template>
<xsl:template match="ilink" mode="following" >
<xsl:apply-templates/>
<xsl:if test="not(name(following-sibling::node()[1])='ilink')
and
@*"><id><xsl:copy-of select="@*"/></id></xsl:if>
<xsl:if
test="name(following-sibling::node()[1])='ilink'"><xsl:apply-templates
select="following-sibling::node()[1]" mode="following"/></xsl:if>
</xsl:template>
<!-- run of formatting tags, eg tags without attributes -->
<xsl:template match="b | i | sup | sub | u | smallcaps | red"
priority="2">
<xsl:variable name="ename" select="name(.)"/>
<xsl:if
test="not(local-name(preceding-sibling::node()[1])=string($ename))">
<xsl:element name="{$ename}">
<xsl:apply-templates/>
<xsl:if
test="name(following-sibling::node()[1])=string($ename)"><xsl:apply-templates
select="following-sibling::node()[1]" mode="following"/>
</xsl:if>
</xsl:element>
</xsl:if>
</xsl:template>
<xsl:template match="b | i | sup | sub | u | smallcaps | red"
mode="following" >
<xsl:variable name="ename" select="name(.)"/>
<xsl:apply-templates/>
<xsl:if
test="name(following-sibling::node()[1])=string($ename)"><xsl:apply-templates
select="following-sibling::node()[1]" mode="following"/>
</xsl:if>
</xsl:template>
<xsl:template match="@* | node()">
<xsl:copy >
<xsl:apply-templates select="@*" />
<xsl:apply-templates />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Output using msxml4 (correct output, IMHO):
<Chapter>
<ChapterTitle>The chapter title must be immediately followed by a
section title</ChapterTitle>
<Body>
<SectionTitle>The section title</SectionTitle>
<Title>Internal Links: _ilink</Title>
<Paragraph>The internal link to Proteins and Membranes,
optionally
including the cont_id would look like: <ilink id="1234">Proteins and
Membranes</ilink>. You could also just type <ilink>Proteins and
Membranes</ilink>. Another option is <ilink>CBIO|Proteins and
Membranes</ilink>, or
even just <ilink id="1234"/>. You can also do <ilink
id="1234">CBIO|Proteins and Membranes</ilink>. Spaces on either side
of a pipe (|) are
optional.</Paragraph>
<Paragraph>Feel free to include crazy formatting, as in
<ilink>CBIO|<i>Proteins</i> and Membranes</ilink> or <ilink>
<b>
<i>Pr</i>
</b>
<sup>
<b>
<i>o</i>
</b>
</sup>
<sub>
<b>
<i>t</i>
</b>
</sub>
<b>
<i>ei</i>
</b>
<b>
<i>
<u>n</u>
</i>
</b>
<b>
<i>s</i>
</b>and Membranes<id id="1234"/>
</ilink>. </Paragraph>
</Body>
</Chapter>
Output of xalan 2.4, 2.6.0, and instant saxon 6.5.3 (appears to do
nothing, actually):
<Chapter>
<ChapterTitle>The chapter title must be immediately followed by a
section title</ChapterTitle>
<Body>
<SectionTitle>The section title</SectionTitle>
<Title>Internal Links: _ilink</Title>
<Paragraph>The internal link to Proteins and Membranes,
optionally
including the cont_id would look like: <ilink id="1234">Proteins and
Membranes</ilink>. You could also just type <ilink>Proteins and
Membranes</ilink>. Another option is <ilink>CBIO|Proteins and
Membranes</ilink>, or
even just <ilink id="1234"/>. You can also do <ilink
id="1234">CBIO|Proteins and Membranes</ilink>. Spaces on either side
of a pipe (|) are
optional.</Paragraph>
<Paragraph>Feel free to include crazy formatting, as in
<ilink>CBIO|</ilink>
<ilink>
<i>Proteins</i>
</ilink>
<ilink> and Membranes</ilink> or <ilink>
<b>
<i>Pr</i>
</b>
</ilink>
<ilink>
<sup>
<b>
<i>o</i>
</b>
</sup>
</ilink>
<ilink>
<sub>
<b>
<i>t</i>
</b>
</sub>
</ilink>
<ilink>
<b>
<i>ei</i>
</b>
</ilink>
<ilink>
<b>
<i>
<u>n</u>
</i>
</b>
</ilink>
<ilink>
<b>
<i>s</i>
</b>
</ilink>
<ilink id="1234">and Membranes</ilink>. </Paragraph>
</Body>
</Chapter>