xsl-list
[Top] [All Lists]

RE: How to escape invalid characters in XML to XML process

2003-01-22 00:50:02


Vijaya Kumar Y wrote:

Hi,
I'm receiving my input XML from an ASP page, and i get a lot of
invalid for
XML characters. How can I escape/convert these invalid characters for XML
into valid characters.

In general, whenever I have this problem I always go back to whatever is
generating the XML, in your case to the ASP page.  It is much, much easier
to filter out problem characters *before* you put them into the XML document
than after you already have an ill-formed XML document.  Assuming your ASP
is not trying to add any non-printable characters ot the XML, it usually
suffices to filter and replace characters as follows:
    For any text node child of an element:
    o   "<"  becomes  "&lt;"
    o   "&"  becomes  "&amp;"

    For any attribute value:
    o   "<"  becomes  "&lt;"
    o   "&"  becomes  "&amp;"
    o   '"'  becomes  '&quot;' (if you are using quote(") to
        delimit the attribute value)
    o   "'"  becomes  "&apos;" (if you are using apostrophe(')
        to delimit the attribute value)


I need to do this because after the
transformation I
post my output XML once again into a ASP page. This is the process:
 getXML from ASP --> applyXSLT --> output XML --> validateXML(use Schema)
--> postXML to ASP
 Can I do that in XSLT when I receive my input XML by matching these
characters and convert them? If yes, how?

No, because XSLT entirely presupposes the well-formedness of the XML input.
XSLT processing (at a given node at least) does not even begin until after
parsing (which would have detected the ill-formed code and aborted there).


Validation is done by Schema.
Is there any template XSLT Script that does that?

Schema validation via XSLT?  Yes, that has been done.  But it is not
relevant here.  Your problem is not document validation (that is, "does the
document match the constraints of a DTD or set of schemata?"), but
fundamental document well-formedness (that is, "Is this document an XML
document at all?").


Thank you for
your help.
If there aither ways of doing that, I would appreciate any pointers.

Again, I think it all comes back to the ASP that is generating the XML in
the first place.  In the general case it is literally impossible to "fix" an
illegal character ill-formedness problem automatically.  In specific cases
it is merely very, very difficult.  You would have to write your own
specialized XML parser with a sophisticated DTD- or schema-aware editing
capability.  In every case I have ever seen, it is simpler to just fix the
problem at the source.


-- Roger Glover
   glover_roger(_at_)yahoo(_dot_)com



 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list



<Prev in Thread] Current Thread [Next in Thread>