Hello!
I have a nested group structure in HTML that I would like to translate
to XML using XSL. In text form, here is what the HTML table looks
like:
Header1
Row 1
Row 2
...
Row N
Header2
Row 1
Row 2
...
Row M
...
I would like to create XML output that looks like:
<header attr=...>
<row>...</row>
<row>...</row>
...
<row>...</row>
</header attr=...>
<header>
<row>...</row>
<row>...</row>
...
<row>...</row>
</header attr=...>
...
This should be very straightforward, but the problem is that the nested
group structure is not reflected in the HTML. Rather than using nested
HTML tables, the whole thing is expressed as one big table with a lot
of rows:
<table class="results">
<tr>
<th>Header1</th>
</tr>
<tr>
<td>row1</td>
</tr>
<tr>
<td>row2</td>
</tr>
...
<tr>
<td>rowN</td>
</tr>
<tr>
<td> </td> <!-- blank line between groups -->
</tr>
<tr>
<th>Header2</th>
</tr>
<tr>
<td>row1</td>
</tr>
<tr>
<td>row2</td>
</tr>
...
<tr>
<td>rowM</td>
</tr>
...
</table>
The only good news is that the "header" rows are written using <th>
tags instead of <td> tags, so I can differentiate "headers" from
"rows". Inspired by some of the posts on this most excellent mailing
list, I came up with the following XSL to accomplish the task:
<xsl:for-each select='//TABLE[(_at_)class="results"]/TR[TH]'>
<header>
<xsl:attribute name=...>...</xsl:attribute>
<xsl:variable name='thisHeader' select='generate-id(.)'/>
<xsl:for-each
select='following-sibling::TR[$thisHeader=generate-id(preceding-
sibling::TR[TH][1])]'>
<row>
...
</row>
</xsl:for-each>
</header>
</xsl:for-each>
This works great, but it's pretty darn inefficient. I'm dealing with
tables that have hundreds of rows, and around a dozen "header"
sections. So my nested for-each loops are causing hundreds of TR nodes
to be evaluated about a dozen times. I'm processing thousands of HTML
files, and there are 6 different types of HTML files, each one has it's
own XSL file for extracting data. None of the other HTML file types
has this weird structural problem, and they all process very quickly.
When one of these weird files is encountered, it takes 5-6 times longer
to process.
It seems like people try to exploit the use of "keys" as much as
possible when trying to maximize processing time efficiency, but I
haven't been able to wrap my head around a "key" solution for this
problem yet.
Can someone think of a more efficient way of dealing with this case?
Thanks!
Peter
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--