xsl-list
[Top] [All Lists]

Re: [xsl] XHTML DTD aware transformation and indentation behaviour

2012-02-02 05:09:58
On 02/02/2012 10:48, Matthieu Ricaud-Dussarget wrote:
Hi all,

In my project I concatenate multiple xhtml files in one xml files. This aggregate file has to be edited by hand, that means indentation is important here for convenience.

Before I discovered XML Catalog, I used to delete all DOCTYPE declarations within source XHTML file with a perl script (which also remplace named entities with UTF-8 ones). This worked fine : the concatenated files were indented exactly like the XHTML sources.

But this was a bit dangerous in case I didn't match a special entity to replace with perl. And this was not a really good XML practice.

Now that I'm using a local XML Catalog and run my tranformation with Saxon in command line with this options : -r:org.apache.xml.resolver.tools.CatalogResolver -x:org.apache.xml.resolver.tools.ResolvingXMLReader -y:org.apache.xml.resolver.tools.ResolvingXMLReader

I can't see exactly what's happening here because your mail client and mine have conspired to ignore the whitespace which was critical to understanding your message.

Generally, if you validate against a DTD, then whitespace in elements whose content model is defined as element-only (for example head and body) will be treated as ignorable, which means it's liable to be lost in a copy operation. Perhaps this is what is happening.

Try the option -strip:none on the command line to prevent this behaviour. The documentation says this is the default, but I'm not convinced it is correct: I seem to remember it changing some time ago in response to a W3C change.

Michael Kay
Saxonica

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--