xsl-list
[Top] [All Lists]

RE: data vs. xml

2003-04-04 08:15:05
I modified the xml generator to include <![CDATA[ ]]> 
elements.  However,  I also found that some of my data 
contains RTF characters (i.e. \x093, \x096, \xB0).  I believe 
this is a result of a copy and paste from MSWord into the 
database program, and it is not an easy thing to fix as the 
database contains tens of thousands of entries.  I also 
noticed that the XSLT processor (instant saxon) still had 
difficulty accepting a <![CDATA[ ]]> node that contained one 
of the above characters.  My understanding is that the data 
found within the <![CDATA[ ]]> should be considered just 
that: data.

A CDATA section must contain legitimate XML characters.

I suspect that your problem is that your XML source file has no encoding
declaration, so the encoding is defaulting to UTF-8, and an octet such
as xB0 is not a valid UTF-8 encoding of any XML character.

You should specify the actual encoding of the XML file in an XML
declaration at its start. Your encoding is probably cp1252. Then you
need to parse it using an XML parser that recognizes this encoding. XML
parsers are not required to support any encodings other than UTF-8 and
UTF-16.

Michael Kay
Software AG
home: Michael(_dot_)H(_dot_)Kay(_at_)ntlworld(_dot_)com
work: Michael(_dot_)Kay(_at_)softwareag(_dot_)com 


 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



<Prev in Thread] Current Thread [Next in Thread>