This is my first XSLT project. I have a recursive solution to a problem
which I hope one of you can improve on.
This an abstraction of a problem that arose in the context of scraping PDF
docs.
PDF->Adobe->HTML->tidy->XML->scrape with XSLT->...
The PDF->HTML conversion, or for that matter, lassoing the text in Acrobat
Reader, cutting and pasting it, yields a different order than what is
displayed on screen by Acrobat Reader. It's not so badly mangled that it
can't be recovered. However, related items are no longer near one another. I
need to recover the original relationship between the related items.
I'm hoping someone can come up with a better solution that the one I present
below, which I believe is O(n squared), where n is large (the original
document is 170+ pages). I've considered outputting the a's and b's into two
result files with two xslt programs and processing those. I think XSLT 1.1
would allow this to be done within a single xslt program by building two
node sets, but I'd like to stick to 1.0, if possible.
XML source:
<?xml version="1.0"?>
<list>
<a>a1</a>
<a>a2</a>
<a>a3</a>
<b>b1</b>
<b>b2</b>
<a>a4</a>
<b>b3</b>
<a>a5</a>
<b>b4</b>
<a>a6</a>
<b>b5</b>
<b>b6</b>
</list>
The XSLT:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<!-- match first a and make top level call to recursive template -->
<xsl:template match="a[1]">
<xsl:call-template name="do_a">
<xsl:with-param name="ix" select="1" />
</xsl:call-template>
</xsl:template>
<!-- recursive template counts a's -->
<xsl:template name="do_a">
<xsl:param name="ix" />
<!-- output this a -->
<xsl:text>
</xsl:text>
<xsl:value-of select="$ix" /><xsl:text>: </xsl:text><xsl:value-of
select="."/>
<!-- output corresponding b -->
<xsl:text> </xsl:text><xsl:copy-of select="/list/b[$ix]/text()" />
<!-- This for-each moves to the next a; doesn't loop. -->
<xsl:for-each select="following-sibling::a[1]">
<!-- increment counter and output rest of a's -->
<xsl:call-template name="do_a">
<xsl:with-param name="ix" select="$ix+1" />
</xsl:call-template>
</xsl:for-each>
</xsl:template>
<!-- suppress other output -->
<xsl:template match="text()" />
</xsl:stylesheet>
And this is the output (Saxon 6.5.1 with XFactor GUI):
<?xml version="1.0" encoding="utf-8"?>
1: a1 b1
2: a2 b2
3: a3 b3
4: a4 b4
5: a5 b5
6: a6 b6
Thanks,
Mat M.
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list