Here's an XSLT 2.0 solution to the problem. It involves two stylesheets
(though it could be combined with a bit more effort). The algorithm goes
like this:
First, chunk up all the bits, which turns this into a grouping problem.
Second, solve the grouping problem.
Given the following XML file:
<doc>
<paragraph num="1">Yadda Yadda Yadda <italic>Italic Yadda</italic>
Yadda: <blockquote>Blah Blah Blah Blah</blockquote> Yackity Yack
Yack</paragraph>
</doc>
Use this XSL file:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<x>
<xsl:apply-templates/>
</x>
</xsl:template>
<xsl:template match="paragraph">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="paragraph/text()">
<p num="{../@num}"
group="{count(preceding-sibling::blockquote)}"><xsl:value-of
select="."/></p>
</xsl:template>
<xsl:template match="blockquote">
<blockquote><xsl:apply-templates/></blockquote>
</xsl:template>
<xsl:template match="italic">
<p num="{../@num}"
group="{count(preceding-sibling::blockquote)}"><span
style="font-style:italic"><xsl:apply-templates/></span></p>
</xsl:template>
</xsl:stylesheet>
To create the chunks, thus:
<?xml version="1.0" encoding="UTF-8"?>
<x>
<p num="1" group="0">Yadda Yadda Yadda </p>
<p num="1" group="0"><span style="font-style:italic">Italic
Yadda</span></p>
<p num="1" group="0"> Yadda: </p>
<blockquote>Blah Blah Blah Blah</blockquote>
<p num="1" group="1"> Yackity Yack Yack</p>
</x>
Now it's a grouping problem, which can be solved in XSLT 2.0 with this
stylesheet:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:key name="mixed" match="p" use="@group"/>
<xsl:template match="x">
<html>
<head>
<title>Paragraph Chunking Test</title>
</head>
<body>
<xsl:for-each-group select="p" group-by="@group">
<p>
<xsl:for-each select="../p[(_at_)group=current-grouping-key()]">
<xsl:apply-templates/>
</xsl:for-each>
</p>
<xsl:apply-templates
select="current-group()/following-sibling::blockquote"/>
</xsl:for-each-group>
</body>
</html>
</xsl:template>
<xsl:template match="p"/>
<xsl:template match="blockquote">
<xsl:copy-of select="."/>
</xsl:template>
<xsl:template match="span">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
which yields:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Paragraph Chunking Test</title>
</head>
<body>
<p>Yadda Yadda Yadda <span style="font-style:italic">Italic
Yadda</span> Yadda:
</p>
<blockquote>Blah Blah Blah Blah</blockquote>
<p> Yackity Yack Yack</p>
</body>
</html>
I have not yet gotten an XSLT 1.0 grouping solution for this problem (I
don't have much time to spend on this issue). I am sending along the XSLT
2.0 solution (which took me perhaps 10 minutes to do - I love
xsl:for-each-group) just to show one workable (IMHO) way to solve the
problem. James Fuller has shown us another, but I think there's value in
multiple approaches.
I tested it all with Saxon 8.4, by the way.
Jay Bryant
Bryant Communication Services
(presently consulting at Synergistic Solution Technologies)
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--