xsl-list
[Top] [All Lists]

AW: AW: Detecting carriage return and newline feed in XML Data

2004-11-01 03:49:03
XML input is processed by the XML parser before it gets anywhere near the
XSLT processor. The only way to prevent XML's normalization of whitespace
characters (whether in element or attribute content) is to write the
characters as character references, e.g. 
 You can of course do that by
preprocessing the file in some non-XML-aware tool before submitting it to
the XML parser.

Are you really sure you need to do this? Somehow, you're not using XML the
way it was intended to be used and that's always bad news. I've forgotten
what your original problem was, if you ever explained it.

Michael Kay


Ok, let me explain the whole problematic :

1. The XML Document is generated by System Architect (Popkin Software). This 
software is intended to help build EAI (Enterprise Application Integration).
2. Each diagram, such as each symbol it contains have their own user defined 
properties. One of them is a free text field (here SAProperty/@SAPrpValue) 
which we use to freely describe the property of his respectiv symbol.
3. The text inside is divided by a number of paragraphs (who are commonly 
separated through carriage return and new line feed).
4. The System Architect cleverly export all structured diagrams and their 
properties into one single XML. The text field described before is as well 
stored as an attribut of an XML element. Below a (tiny) part of the overhall 
60MB XML Document :

<?xml version="1.0" encoding="UTF-16" ?>
<Classes>
        <Class>
                <SADefinition SAObjId="_2753" SAObjName="app_HybridPost" 
SAObjMinorTypeName="Application"                       SAObjMinorTypeNum="309" 
SAObjMajorTypeNum="3" SAObjAuditId="MiL"                                        
        SAObjUpdateDate="25.08.2004" SAObjUpdateTime="09:20:26">
                <SAProperty SAPrpName="Description"                     
                SAPrpValue="Mit der Anwendung HybridPost wird die bestehende 
Infrastruktur von Postfinance für den                      Druck und die 
Verpackung von Kundendokumenten von Drittkunden im Printcenter Zürich genutzt.
                Das Projekt &quot;Strategie HybridPost&quot;, das sich zur Zeit 
in der Voranalyse-Phase befindet, hat zum               Ziel, die 
HybridPost-Lösung weiterzuentwickeln und zusätzliche Komponenten wie 
Archivierung, Billing,                   Druck, Verpackung und Call-Center in 
die bestehende Lösung zu integrieren.
                Die Plattform HypoShare wird als Teil des Anwendungssystems 
HybridPost modelliert." SAPrpEditType="1"           SAPrpLength="4074"/>
                <SAProperty SAPrpName="GUID" 
SAPrpValue="b1318511-4b95-11d6-8062-00c09f0645a1"                          
SAPrpEditType="1" SAPrpLength="64"/>
                ...
                BLABLABLA....
                ...
        </Class>
</Classes>

5. You'll see that after the word "genutzt." and "integrieren", there is a 
carriage return (assuming that your browser handles it)
6. I need to have it in my FOP processed PDF document printed without loosing 
the paragraphs.

I hope it will help ;-)

Cheers

Lawrence