I have to scrape a HTML page, not something I do very often, so I pass
it through Tidy -asxml and end up with (XHTML):
...
<h3>Committees</h3>
<table style="width:99%;">
<tr class="colour2">
<td width="100%">
<b>Committee :</b> Association des études françaises et
francophones en Irlande<br/><b>From:</b>01-JAN-04
<b>To:</b> 30-DEC-99
</td>
</tr>
</table>
...
In my XSLT2 (Saxon9 via Cocoon), with
xmlns:h="http://www.w3.org/1999/xhtml" and xsl:output="HTML" I have:
<xsl:copy-of select="//h:table
[preceding-sibling::*[1][local-name()='h3']]
[preceding-sibling::*[1][local-name()='h3']='Committees']"/>
which produces:
<table xmlns="http://www.w3.org/1999/xhtml" style="width:99%;">
<tr class="colour2">
<td width="100%">
<b>Committee :</b> Association des études françaises et
francophones en Irlande<br></br><b>From:</b>01-JAN-04
<b>To:</b> 30-DEC-99
</td>
</tr>
</table>
The <br/> of Tidy's generated XHTML is being expanded by the copy-of to
<br></br> instead of being contracted to <br> as implied by the output
setting of HTML. If copy-of is able to detect the <br/> and perform an
implicit transform like that, I'm puzzled as to why it does it that way
round.
I'm sure there is a good reason for it (although it is opaque to me) but
it results in IE rendering two newlines, not one, and we can't go
upsetting IE users :-)
Is there a way to avoid this, or should I work around it by providing
suitable identity templates and using apply-templates instead of copy-of?
[It's probably blindingly obvious, but at this point in this week I'm
probably not seeing it :-]
///Peter
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--