On 02/02/2012 10:48, Matthieu Ricaud-Dussarget wrote:
Hi all,
In my project I concatenate multiple xhtml files in one xml files.
This aggregate file has to be edited by hand, that means indentation
is important here for convenience.
Before I discovered XML Catalog, I used to delete all DOCTYPE
declarations within source XHTML file with a perl script (which also
remplace named entities with UTF-8 ones). This worked fine : the
concatenated files were indented exactly like the XHTML sources.
But this was a bit dangerous in case I didn't match a special entity
to replace with perl. And this was not a really good XML practice.
Now that I'm using a local XML Catalog and run my tranformation with
Saxon in command line with this options :
-r:org.apache.xml.resolver.tools.CatalogResolver
-x:org.apache.xml.resolver.tools.ResolvingXMLReader
-y:org.apache.xml.resolver.tools.ResolvingXMLReader
I can't see exactly what's happening here because your mail client and
mine have conspired to ignore the whitespace which was critical to
understanding your message.
Generally, if you validate against a DTD, then whitespace in elements
whose content model is defined as element-only (for example head and
body) will be treated as ignorable, which means it's liable to be lost
in a copy operation. Perhaps this is what is happening.
Try the option -strip:none on the command line to prevent this
behaviour. The documentation says this is the default, but I'm not
convinced it is correct: I seem to remember it changing some time ago in
response to a W3C change.
Michael Kay
Saxonica
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--