xsl-list
[Top] [All Lists]

[xsl] XHTML DTD aware transformation and indentation behaviour

2012-02-02 04:49:23
Hi all,

In my project I concatenate multiple xhtml files in one xml files. This aggregate file has to be edited by hand, that means indentation is important here for convenience.

Before I discovered XML Catalog, I used to delete all DOCTYPE declarations within source XHTML file with a perl script (which also remplace named entities with UTF-8 ones). This worked fine : the concatenated files were indented exactly like the XHTML sources.

But this was a bit dangerous in case I didn't match a special entity to replace with perl. And this was not a really good XML practice.

Now that I'm using a local XML Catalog and run my tranformation with Saxon in command line with this options : -r:org.apache.xml.resolver.tools.CatalogResolver -x:org.apache.xml.resolver.tools.ResolvingXMLReader -y:org.apache.xml.resolver.tools.ResolvingXMLReader

Lets go in the probleme, my XSL is a simple identity template :

<xsl:output method="xhtml" indent="no" encoding="UTF-8" omit-xml-declaration="no" doctype-public="-//W3C//DTD XHTML 1.1//EN" doctype-system="http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"/>

<xsl:template match="* | @* | processing-instruction() | comment()" mode="copy">
<xsl:copy copy-namespaces="no">
<xsl:apply-templates select="node()|@*" mode="copy"/>
</xsl:copy>
</xsl:template>

<xsl:template match="/">
<xsl:apply-templates mode="copy"/>
</xsl:template>

this is my XML source :
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml";>
<head>
<title>title</title>
<link href="my.css" rel="stylesheet" type="text/css" />
<script type="text/javascript" src="my.js"></script>
</head>
<body>
<div class="body">
<div class="pageTitre_container">
<h1>
<span>Title 1</span>
</h1>
<p><span class="big">This</span> is <span class="little">a paragraphe</span></p> <p><span class="big">This</span> is <span class="little">a paragraphe</span></p>
</div>
</div>
<table>
<caption>This is a table</caption>
<thead>
<tr>
<td>Col 1</td>
<td>Col 2</td>
<td>Col 3</td>
<td>Col 4</td>
<td>Col 5</td>
</tr>
</thead>
<tbody>
<tr>
<td> </td>
<td colspan="3" rowspan="7">
<p class="entitre-en-savoir-">À savoir</p>
<p class="no">
<span class="no-style-override-5">Certains grands magasins proposent des comparatifs très complets, prenez le temps de les parcourir. Vous pouvez également chercher des infos sur Internet via les sites des fabricants, ou sur les forums&#160;: rien ne vaut l’avis d’un consommateur pour se faire une idée précise du produit&#160;!</span>
</p>
</td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
</tbody>
</table>
</body>
</html>

Which gives as output :

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd";> <html xmlns="http://www.w3.org/1999/xhtml";><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /><title>title</title><link href="my.css" rel="stylesheet" type="text/css" /><script type="text/javascript" src="my.js"></script></head><body><div class="body">
<div class="pageTitre_container">
<h1>
<span>Title 1</span>
</h1>
<p><span class="big">This</span> is <span class="little">a paragraphe</span></p> <p><span class="big">This</span> is <span class="little">a paragraphe</span></p>
</div>
</div><table><caption>This is a table</caption><thead><tr><td>Col 1</td><td>Col 2</td><td>Col 3</td><td>Col 4</td><td>Col 5</td></tr></thead><tbody><tr><td> </td><td colspan="3" rowspan="7">
<p class="entitre-en-savoir-">À savoir</p>
<p class="no">
<span class="no-style-override-5">Certains grands magasins proposent des comparatifs très complets, prenez le temps de les parcourir. Vous pouvez également chercher des infos sur Internet via les sites des fabricants, ou sur les forums : rien ne vaut l’avis d’un consommateur pour se faire une idée précise du produit !</span>
</p>
</td><td> </td></tr><tr><td> </td><td> </td></tr><tr><td> </td><td> </td></tr><tr><td> </td><td> </td></tr><tr><td> </td><td> </td></tr><tr><td> </td><td> </td></tr><tr><td> </td><td> </td></tr><tr><td> </td><td> </td><td> </td><td> </td><td> </td></tr></tbody></table></body></html>

If I comment the DOCTYPE in the source I get :

<?xml version="1.0" encoding="UTF-8"?><!--<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd";>-->
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml";>
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>title</title>
<link href="my.css" rel="stylesheet" type="text/css" />
<script type="text/javascript" src="my.js"></script>
</head>
<body>
<div class="body">
<div class="pageTitre_container">
<h1>
<span>Title 1</span>
</h1>
<p><span class="big">This</span> is <span class="little">a paragraphe</span></p> <p><span class="big">This</span> is <span class="little">a paragraphe</span></p>
</div>
</div>
<table>
<caption>This is a table</caption>
<thead>
<tr>
<td>Col 1</td>
<td>Col 2</td>
<td>Col 3</td>
<td>Col 4</td>
<td>Col 5</td>
</tr>
</thead>
<tbody>
<tr>
<td> </td>
<td colspan="3" rowspan="7">
<p class="entitre-en-savoir-">À savoir</p>
<p class="no">
<span class="no-style-override-5">Certains grands magasins proposent des comparatifs très complets, prenez le temps de les parcourir. Vous pouvez également chercher des infos sur Internet via les sites des fabricants, ou sur les forums : rien ne vaut l’avis d’un consommateur pour se faire une idée précise du produit !</span>
</p>
</td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
</tr>
<tr>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
<td> </td>
</tr>
</tbody>
</table>
</body>
</html>


the head element is now indented and the table too, this is what i would like... but I don't want to comment the doctype in the source.

Has it something to do with the XHTML DTD model ? Any Idea how to achieve what I'd like ?

Thanks,

Matthieu.





--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--