xsl-list
[Top] [All Lists]

RE: Collapsing run-on tag chains not working in saxon or xalan

2004-11-01 13:14:50
With the following xml and xsl, the Microsoft msxmldom 4 is producing
the expected output, but xalan 2.4, 2.6, and saxon 6.5.3 are not: they
all produce
the same, unexpected output.

Without reading further, this is almost certainly because Microsoft
(incorrectly, or at least stretching what the spec allows) strips whitespace
text nodes from the document by default.

The purpose of this code is to collapse run-on chains like
<ilink>foo</link><link id="1234">bar</link> into a single tag
<link>foo bar<id id="1234"/>
</ilink>. The xsl will also collapse run-on chains of b, i, sup, sub,
and similar tags.

But in your input, the elements aren't adjacent to each other - they have
whitespace in between.

Michael Kay
http://www.saxonica.com



Can anyone explain to me whether xalan and saxon just have a bug, and
preferably how to get xalan and/or saxon to transform the way msxml4
does here
(which I believe is correct)?

TMIA,
Richard Bondi


Sample input:

<Chapter>
      <ChapterTitle>The chapter title must be immediately 
followed by a
section title</ChapterTitle>
      <Body>
              <SectionTitle>The section title</SectionTitle>
              <Title>Internal Links: _ilink</Title>
              <Paragraph>The internal link to Proteins and 
Membranes, optionally
including the cont_id would look like: <ilink id="1234">Proteins and
              Membranes</ilink>. You could also just type 
<ilink>Proteins and
Membranes</ilink>. Another option is <ilink>CBIO|Proteins and
Membranes</ilink>, or
              even just <ilink id="1234"/>. You can also do <ilink
id="1234">CBIO|Proteins and Membranes</ilink>. Spaces on either side
of a pipe (|) are
              optional.</Paragraph>
              <Paragraph>Feel free to include crazy 
formatting, as in <ilink>CBIO|</ilink>
                      <ilink>
                              <i>Proteins</i>
                      </ilink>
                      <ilink> and Membranes</ilink> or <ilink>
                              <b>
                                      <i>Pr</i>
                              </b>
                      </ilink>
                      <ilink>
                              <sup>
                                      <b>
                                              <i>o</i>
                                      </b>
                              </sup>
                      </ilink>
                      <ilink>
                              <sub>
                                      <b>
                                              <i>t</i>
                                      </b>
                              </sub>
                      </ilink>
                      <ilink>
                              <b>
                                      <i>ei</i>
                              </b>
                      </ilink>
                      <ilink>
                              <b>
                                      <i>
                                              <u>n</u>
                                      </i>
                              </b>
                      </ilink>
                      <ilink>
                              <b>
                                      <i>s</i>
                              </b>
                      </ilink>
                      <ilink id="1234">and Membranes</ilink>. 
</Paragraph>
      </Body>
</Chapter>


Xsl:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
      <xsl:output encoding="ISO-8859-1"/>
      <xsl:template match="/">
              <xsl:apply-templates/>
      </xsl:template>
      <!-- run of ilinks -->
      <xsl:template match="ilink">
              <xsl:if 
test="not(local-name(preceding-sibling::node()[1])='ilink')">
                      <ilink>
                              <xsl:if 
test="not(name(following-sibling::node()[1])='ilink')"><xsl:copy-of
select="@*"/></xsl:if>
                              <xsl:apply-templates/>
                              <xsl:if 
test="name(following-sibling::node()[1])='ilink'"><xsl:apply-templates
select="following-sibling::node()[1]" mode="following"/></xsl:if>
                      </ilink>
              </xsl:if>
      </xsl:template>
      <xsl:template match="ilink" mode="following" >
              <xsl:apply-templates/>
              <xsl:if 
test="not(name(following-sibling::node()[1])='ilink') and
@*"><id><xsl:copy-of select="@*"/></id></xsl:if>
              <xsl:if 
test="name(following-sibling::node()[1])='ilink'"><xsl:apply-templates
select="following-sibling::node()[1]" mode="following"/></xsl:if>
      </xsl:template>
      <!-- run of formatting tags, eg tags without attributes -->
      <xsl:template match="b | i | sup | sub | u | smallcaps 
| red" priority="2">
              <xsl:variable name="ename" select="name(.)"/>
              <xsl:if 
test="not(local-name(preceding-sibling::node()[1])=string($ename))">
                      <xsl:element name="{$ename}">
                              <xsl:apply-templates/>
                              <xsl:if 
test="name(following-sibling::node()[1])=string($ename)"><xsl:
apply-templates
select="following-sibling::node()[1]" mode="following"/>
                              </xsl:if>
                      </xsl:element>
              </xsl:if>
      </xsl:template>
      <xsl:template match="b | i | sup | sub | u | smallcaps | red"
mode="following" >
              <xsl:variable name="ename" select="name(.)"/>
              <xsl:apply-templates/>
              <xsl:if 
test="name(following-sibling::node()[1])=string($ename)"><xsl:
apply-templates
select="following-sibling::node()[1]" mode="following"/>
              </xsl:if>
      </xsl:template>
      <xsl:template match="@* | node()">
              <xsl:copy >
                      <xsl:apply-templates select="@*" />
                      <xsl:apply-templates />
              </xsl:copy>
      </xsl:template>
</xsl:stylesheet>


Output using msxml4 (correct output, IMHO):

<Chapter>
      <ChapterTitle>The chapter title must be immediately 
followed by a
section title</ChapterTitle>
      <Body>
              <SectionTitle>The section title</SectionTitle>
              <Title>Internal Links: _ilink</Title>
              <Paragraph>The internal link to Proteins and 
Membranes, optionally
including the cont_id would look like: <ilink id="1234">Proteins and
              Membranes</ilink>. You could also just type 
<ilink>Proteins and
Membranes</ilink>. Another option is <ilink>CBIO|Proteins and
Membranes</ilink>, or
              even just <ilink id="1234"/>. You can also do <ilink
id="1234">CBIO|Proteins and Membranes</ilink>. Spaces on either side
of a pipe (|) are
              optional.</Paragraph>
              <Paragraph>Feel free to include crazy formatting, as in
<ilink>CBIO|<i>Proteins</i> and Membranes</ilink> or <ilink>
                              <b>
                                      <i>Pr</i>
                              </b>
                              <sup>
                                      <b>
                                              <i>o</i>
                                      </b>
                              </sup>
                              <sub>
                                      <b>
                                              <i>t</i>
                                      </b>
                              </sub>
                              <b>
                                      <i>ei</i>
                              </b>
                              <b>
                                      <i>
                                              <u>n</u>
                                      </i>
                              </b>
                              <b>
                                      <i>s</i>
                              </b>and Membranes<id id="1234"/>
                      </ilink>. </Paragraph>
      </Body>
</Chapter>


Output of xalan 2.4, 2.6.0, and instant saxon 6.5.3 (appears to do
nothing, actually):

<Chapter>
      <ChapterTitle>The chapter title must be immediately 
followed by a
section title</ChapterTitle>
      <Body>
              <SectionTitle>The section title</SectionTitle>
              <Title>Internal Links: _ilink</Title>
              <Paragraph>The internal link to Proteins and 
Membranes, optionally
including the cont_id would look like: <ilink id="1234">Proteins and
              Membranes</ilink>. You could also just type 
<ilink>Proteins and
Membranes</ilink>. Another option is <ilink>CBIO|Proteins and
Membranes</ilink>, or
              even just <ilink id="1234"/>. You can also do <ilink
id="1234">CBIO|Proteins and Membranes</ilink>. Spaces on either side
of a pipe (|) are
              optional.</Paragraph>
              <Paragraph>Feel free to include crazy 
formatting, as in <ilink>CBIO|</ilink>
                      <ilink>
                              <i>Proteins</i>
                      </ilink>
                      <ilink> and Membranes</ilink> or <ilink>
                              <b>
                                      <i>Pr</i>
                              </b>
                      </ilink>
                      <ilink>
                              <sup>
                                      <b>
                                              <i>o</i>
                                      </b>
                              </sup>
                      </ilink>
                      <ilink>
                              <sub>
                                      <b>
                                              <i>t</i>
                                      </b>
                              </sub>
                      </ilink>
                      <ilink>
                              <b>
                                      <i>ei</i>
                              </b>
                      </ilink>
                      <ilink>
                              <b>
                                      <i>
                                              <u>n</u>
                                      </i>
                              </b>
                      </ilink>
                      <ilink>
                              <b>
                                      <i>s</i>
                              </b>
                      </ilink>
                      <ilink id="1234">and Membranes</ilink>. 
</Paragraph>
      </Body>
</Chapter>

--+------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--+--