At 10:55 AM 8/28/2003, Taro wrote (to Mike):
Your book talks about canonical documents, and what I want is
a tool that indents a document without changing its semantics.
But that's the nub of the problem.
Define the "semantics" of white space, and we're done. :->
In SGML, in which the DTD is required for a document to be processed, it is
possible to define "insignificant" whitespace by reference to an element's
content model. (If #PCDATA appears, whitespace is significant.)
In XML, in which a DTD may or may not be processed, it's impossible to
define what whitespace is significant (must therefore be left alone) and
what is insignificant (may be munged and remunged without damage) in the
general case. To ameliorate this, XSLT gives you xsl:preserve-space and
xsl:strip-space, which allows you some control by element type. This makes
the problem more tractable in XSLT, providing you're willing to hand-wire
the semantics in at that level.
It'd be nice of xmllint to leave CDATA marked sections alone, but that's
just the tip of the iceberg. (Think of how much poetry on the web is marked
up with <pre> to control whitespace. Ugh.)
Cheers,
Wendell
======================================================================
Wendell Piez
mailto:wapiez(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc. http://www.mulberrytech.com
17 West Jefferson Street Direct Phone: 301/315-9635
Suite 207 Phone: 301/315-9631
Rockville, MD 20850 Fax: 301/315-8285
----------------------------------------------------------------------
Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list