It's likely that the HTML isn't well-formed XML, so you're going to have to
extract it as a string, put it through the tidy utility, parse it, and get
it back into the stylesheet in tree form before you can manipulate it at the
node level.
I would tend to do this as a non-XSLT stage in a processing pipeline; you
could also do it by calling out to an extension function.
Of course Michael is probably still using XSLT1. Some of us have moved
up to XSLT2 (There's a nice implementation called saxon8...) in which
case you can handle a fair amount of "non well formed html as a string"
just using XSLT2 functions.
eg
h.xml:
<greeting><![CDATA[<P>Hello, <i>world!</P>]]></greeting>
h.xsl:
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:d="data:,dpc"
exclude-result-prefixes="d">
<xsl:import href="http://www.dcarlisle.demon.co.uk/htmlparse.xsl"/>
<xsl:output method="html"/>
<xsl:template match="/">
<html>
<head>
<title>Today's greeting</title>
</head>
<body>
<xsl:copy-of select="d:htmlparse(string(greeting[1]),'',true())/node()"/>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
$ saxon8 h.xml h.xsl
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Today's greeting</title>
</head>
<body>
<p>Hello, <i>world!</i></p><i></i></body>
</html>
The <i></i> there is an artifact of its html "recovery" mode of
re-opening automatically closed elements (looks like I should improve
that a bit one day), you can turn off that so by changing true() in the
above call to false() then you get
$ saxon8 h.xml h.xsl
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Today's greeting</title>
</head>
<body>
<P>Hello, <i>world!</i></P>
</body>
</html>
so now the <i> element has been closed but no lowercasing or other
html-specific transformations have been done, and <i> isn't re-opened.
David
________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--