xsl-list
[Top] [All Lists]

RE: AW: Detecting carriage return and newline feed in XML Data

2004-11-01 04:26:04
OK, it seems that what you need to preserve is the fact that there is a
newline - not the particular representation of the newline (NL vs CR/LF). 

Newlines in attribute values are not considered significant by an XML
parser, and are converted to spaces. For this kind of content it would be
much better to generate elements rather than attributes, then the newlines
(but not their particular representation) would be preserved. The
alternative would be to represent the newlines within the attribute as

. If you can't change the code that generates the XML, then your only
option is to preprocess the XML with some non-XML-aware tool.

XSLT can't help you here, I'm afraid: the damage is done before XSLT kicks
in.

Michael Kay
http://www.saxonica.com/

-----Original Message-----
From: michella(_at_)post(_dot_)ch [mailto:michella(_at_)post(_dot_)ch] 
Sent: 01 November 2004 10:49
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: AW: AW: [xsl] Detecting carriage return and newline 
feed in XML Data

XML input is processed by the XML parser before it gets 
anywhere near the
XSLT processor. The only way to prevent XML's normalization 
of whitespace
characters (whether in element or attribute content) is to write the
characters as character references, e.g. 
 You can of 
course do that by
preprocessing the file in some non-XML-aware tool before 
submitting it to
the XML parser.

Are you really sure you need to do this? Somehow, you're 
not using XML the
way it was intended to be used and that's always bad news. 
I've forgotten
what your original problem was, if you ever explained it.

Michael Kay


Ok, let me explain the whole problematic :

1. The XML Document is generated by System Architect (Popkin 
Software). This software is intended to help build EAI 
(Enterprise Application Integration).
2. Each diagram, such as each symbol it contains have their 
own user defined properties. One of them is a free text field 
(here SAProperty/@SAPrpValue) which we use to freely describe 
the property of his respectiv symbol.
3. The text inside is divided by a number of paragraphs (who 
are commonly separated through carriage return and new line feed).
4. The System Architect cleverly export all structured 
diagrams and their properties into one single XML. The text 
field described before is as well stored as an attribut of an 
XML element. Below a (tiny) part of the overhall 60MB XML Document :

<?xml version="1.0" encoding="UTF-16" ?>
<Classes>
      <Class>
              <SADefinition SAObjId="_2753" 
SAObjName="app_HybridPost" SAObjMinorTypeName="Application"   
              SAObjMinorTypeNum="309" SAObjMajorTypeNum="3" 
SAObjAuditId="MiL"                                            
SAObjUpdateDate="25.08.2004" SAObjUpdateTime="09:20:26">
              <SAProperty SAPrpName="Description"                     
              SAPrpValue="Mit der Anwendung HybridPost wird 
die bestehende Infrastruktur von Postfinance für den          
      Druck und die Verpackung von Kundendokumenten von 
Drittkunden im Printcenter Zürich genutzt.
              Das Projekt &quot;Strategie HybridPost&quot;, 
das sich zur Zeit in der Voranalyse-Phase befindet, hat zum   
      Ziel, die HybridPost-Lösung weiterzuentwickeln und 
zusätzliche Komponenten wie Archivierung, Billing,            
      Druck, Verpackung und Call-Center in die bestehende 
Lösung zu integrieren.
              Die Plattform HypoShare wird als Teil des 
Anwendungssystems HybridPost modelliert." SAPrpEditType="1"   
      SAPrpLength="4074"/>
              <SAProperty SAPrpName="GUID" 
SAPrpValue="b1318511-4b95-11d6-8062-00c09f0645a1"             
              SAPrpEditType="1" SAPrpLength="64"/>
              ...
              BLABLABLA....
              ...
      </Class>
</Classes>

5. You'll see that after the word "genutzt." and 
"integrieren", there is a carriage return (assuming that your 
browser handles it)
6. I need to have it in my FOP processed PDF document printed 
without loosing the paragraphs.

I hope it will help ;-)

Cheers

Lawrence

--+------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: 
<mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--+--





<Prev in Thread] Current Thread [Next in Thread>