Hi Martin,
Thank you for this. It looks very elegant.
Can you please explain the idea of the line:
<xsl:template match="p[preceding-sibling::p[1][span[(_at_)class ne 'chapter']
and not(matches(span[(_at_)class ne 'chapter'][last()], '[.?"!]$'))]]"/>
Does it remove the p that has preceding sibling with no ending
character at the end of the last span?
I tried it with a more complete example like the following:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<meta http-equiv="Content-Type" content="application/xhtml+xml;
charset=utf-8"/>
<title/>
<link href="test.css" rel="stylesheet" type="text/css"/>
</head>
<body>
<p dir="rtl">
<span class="chapter">line1</span>
</p>
<p dir="rtl"> <br />
<span class="regular">line3.</span>
<span class="italic">line4</span>
<span class="regular">line5."</span>
</p>
<p dir="rtl"> <br />
<span class="regular">line6.</span>
<br />
<span class="regular">line7</span>
</p>
<p dir="rtl"> <br />
<span class="regular">line8.</span>
<span class="regular">line9.</span>
</p>
</body>
</html>
The output was:
<?xml version="1.0" encoding="UTF-8"?><html
xmlns="http://www.w3.org/1999/xhtml"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xml:lang="en"
version="-//W3C//DTD XHTML 1.1//EN">
<head profile="">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title></title>
<link href="test.css" rel="stylesheet" type="text/css"
xml:space="preserve" />
</head>
<body xml:space="preserve">
<p dir="rtl" xml:space="preserve">
<span class="chapter" xml:space="preserve">line1</span>
</p>
<p dir="rtl" xml:space="preserve"> <br xml:space="preserve" />
<span class="regular" xml:space="preserve">line3.</span>
<span class="italic" xml:space="preserve">line4</span>
<span class="regular" xml:space="preserve">line5."</span>
</p>
<p dir="rtl" xml:space="preserve"> <br xml:space="preserve" />
<span class="regular" xml:space="preserve">line6.</span>
<br xml:space="preserve" />
<span class="regular" xml:space="preserve">line7</span>
<br xml:space="preserve" />
<span class="regular" xml:space="preserve">line8.</span>
<span class="regular" xml:space="preserve">line9.</span>
</p>
</body>
</html>
How can I remove the following:
1. extra xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" and
version="-//W3C//DTD XHTML 1.1//EN" inside html element.
2. extra profile="" in head element
3. extra xml:space="preserve" in p, span and br elements.
Thanks, Viente
On Sun, Jun 14, 2009 at 6:50 PM, Martin
Honnen<Martin(_dot_)Honnen(_at_)gmx(_dot_)de> wrote:
Israel Viente wrote:
My input is something like the following:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
<p dir="rtl">
<span class="chapter">line1</span>
</p>
<p dir="rtl"> <br />
<span class="regular">line3.</span>
<span class="italic">line4</span>
<span class="regular">line5."</span>
</p>
<p dir="rtl"> <br />
<span class="regular">line6.</span>
<br />
<span class="regular">line7</span>
</p>
<p dir="rtl"> <br />
<span class="regular">line8.</span>
<span class="regular">line9.</span>
</p>
</body>
</html>
The reault output should be:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<body>
<p dir="rtl">
<span class="chapter">line1</span>
</p>
<p dir="rtl"> <br />
<span class="regular">line3.</span>
<span class="italic">line4</span>
<span class="regular">line5."</span>
</p>
<p dir="rtl"> <br />
<span class="regular">line6.</span>
<br />
<span class="regular">line7</span>
<span class="regular">line8.</span>
<span class="regular">line9.</span>
</p>
</body>
</html>
For every span element that the class<>'chapter' verify that in every
p the last span element text ends with one character of .?"!
(paragraph ending char).
If it does, copy as is to the output.
Otherwise: Move the span elements from the next p to the current one
and remove the next p completely.
Here is an attempt at solving that with XSLT 2.0:
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xpath-default-namespace="http://www.w3.org/1999/xhtml"
version="2.0">
<xsl:output method="xhtml"/>
<xsl:template match="@* | node()">
<xsl:copy>
<xsl:apply-templates select="@* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p[span[(_at_)class ne 'chapter'] and
not(matches(span[(_at_)class ne 'chapter'][last()], '[.?"!]$'))]">
<xsl:copy>
<xsl:apply-templates select="@* | node() |
following-sibling::p[1]/node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p[preceding-sibling::p[1][span[(_at_)class ne 'chapter']
and not(matches(span[(_at_)class ne 'chapter'][last()], '[.?"!]$'))]]"/>
</xsl:stylesheet>
For the posted input using Saxon 9 it produces the described output but I
have not tested with other inputs.
--
Martin Honnen
http://msmvps.com/blogs/martin_honnen/
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail:
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--