xsl-list
[Top] [All Lists]

Re: [xsl] preserve structure of input XML file

2011-05-17 18:36:00
Another question is what is your output?

As Wendell points out, if your output is XML, whitespace is usually preserved. But if you're trying to generate HTML from an XML element like:

<abstract>
 The primary parameters measured in this dataset are:
            - temperature
            - wind speed
            - humidity

   The units are:

   Temperature               Wind Speed           Humidity
    ==============================
   degrees C                   km/h                        percent

Global Attributes of level 1a datasets are: Mission and Documentation, Data Time, Data Quality, File Metrics, and Scene Coordinates. Vgroups included in the dataset are Scan-Line Attributes, Raw SeaStar Data, Converted Telemetry, Navigation, Sensor Tilt, and Calibration. Of the six Vgroups, four Vgroups, Scan-Line At tributes, Raw SeaStar Data, Converted Telemetry, and Navigation, contain data that are functions of scan lines.
<abstract>

and your XSLT does:

<p><xsl:value-of select="abstract"><p>

Any HTML browser would collapse all your significant whitespace, losing the indenting and the table, squishing everything together into an unreadable mess.

If you simply used <pre>...</pre> instead, then you'd keep the indenting and table, but the final paragraph would scroll endlessly to the right, rather than wrapping with the window size.

If this is your problem, you might consider using our printFormatted.xsl template which tries to guess the intent of the author, and preserve whitespace when it finds consecutive spaces and tabs, but outputs as an ordinary paragraph otherwise:

http://www.ngdc.noaa.gov/metadata/published/views/xml2text/xml-to-text-ISO.xsl

which imports:

http://www.ngdc.noaa.gov/metadata/published/views/xml2text/printFormatted.xsl

We've found it to work reasonably well on many different combinations of whitespace.

! or ?
--Rich

Richard Fozzard, Computer Scientist
 Geospatial Metadata at NGDC: http://www.ngdc.noaa.gov/metadata

Cooperative Institute for Research in Environmental Sciences (CIRES)
Univ. Colorado & NOAA National Geophysical Data Center, Enterprise Data Systems 325 S. Broadway, Skaggs 1B-305, Boulder, CO 80305
Office: 303-497-6487, Cell: 303-579-5615, Email: 
richard(_dot_)fozzard(_at_)noaa(_dot_)gov



Wendell Piez said the following on 05/17/2011 09:35 AM:
Hi,

On 5/17/2011 11:04 AM, a kusa wrote:
I need to preserve all the whitespace and structure of an input XML
file. How do I do that in XSLT? I tried using fixed width fonts like
monospace but that does not work.

Hm. Unfortunately -- assuming you're talking about whitespace in a transformation result, not in how it's displayed -- trying to address this issue using fixed width fonts is like trying to ride a bike by wearing a red shirt. Riding a bike involves momentum, balance, and control of the bike, not color coordination.

As for the answer to your question, that depends on information you haven't given us. All other things being equal, an XSLT transformation will always preserve whitespace from the input XML, and can preserve exactly as much of the structure from it as you like.

But other things are rarely equal. Whitespace can be affected by how your input is parsed and processed (for example, if a schema is present and how it is being used) and how your output is being serialized (since sometimes people want whitespace added then), as well as by details of what the XSLT does.

Some basic rules of thumb if you want whitespace preserved:
  * Don't use MSXML or another processor known to discard whitespace
  * Don't use xsl:strip-space (which strips whitespace from the source)
    or xsl:output/@indent='yes' (which adds it in serialization)
  * Don't use a schema for your input, or if you do (in XSLT 2.0) make
    sure your processor is set to leave whitespace alone
  * Don't add whitespace in your templates

(Others might have things to add to this.)

As for the structure of your XML, how much of it is preserved depends entirely on which elements in the input you are matching and what you are doing with them.

So it's not really a simple question. It is probably best addressed here by giving a small demonstration example, with a specification of the desired result.

Cheers,
Wendell

======================================================================
Wendell Piez                            
mailto:wapiez(_at_)mulberrytech(_dot_)com
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--


--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>