I'm trying to post-process the HTML produced via Adobe Acrobat's PDF
export. (Actually, XHTML via Tidy from Acrobat's HTML 4.01.) Acrobat
does something very funky with end-of-line hyphens that it deems "soft",
namely wrapping the preceding and following text nodes inside a styled
<span> and removing the hyphen. To simplify the situation, if the input
text was
The volumes of the Docu-
mentary History of the Rati-
fication of the Consitution are heavy.
the output would be something like
<p>The volumes of the <i>Docu</i><i>mentary
History of the Rati</i><i>cation of the Constitution</i>
are heavy.</p>
Now there are various reasons why it would be nice to transform these
constructs so that all consecutive <i> elements are wrapped in a single
element. I've come up with the following XSLT 2.0 templates that rely on
the '>>' operator to group consecutive sibling <i>'s for processing. It
works on some sample data, but it is a risky transform because if the
logic is not perfect, there could be dropped <i>'s. Can anyone see a
potential case where this would fail?
<xsl:template match="i">
<xsl:choose>
<xsl:when test="preceding-sibling::node()[1][self::i]">
<!-- omit, the next when-clause handles me -->
</xsl:when>
<xsl:when test="following-sibling::node()[1][self::i]">
<xsl:variable name="stopNode"
select="following-sibling::node()[not(self::i)][1]"/>
<xsl:copy>
<xsl:apply-templates/>
<xsl:apply-templates
select="following-sibling::i[not(. >> $stopNode)]"
mode="copy"/>
</xsl:copy>
</xsl:when>
<xsl:otherwise>
<xsl:copy><xsl:apply-templates/></xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="i" mode="copy">
<xsl:apply-templates/>
</xsl:template>
DS
--
David Sewell, Editorial and Technical Manager
ROTUNDA, The University of Virginia Press
PO Box 801079, Charlottesville, VA 22904-4318 USA
Courier: 310 Old Ivy Way, Suite 302, Charlottesville VA 22903
Email: dsewell(_at_)virginia(_dot_)edu Tel: +1 434 924 9973
Web: http://rotunda.upress.virginia.edu/
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--