If someone sends you a document that isn't well-formed XML, the best
strategy is to get the people who produced it to mend their ways.
True. However, having in an XML file and finding out that all of
a sudden XML is not XML anymore must be among the most frequent
unpleasant surprises fresh XML programmers have to deal with. I believe
it was among one of my first questions to this list as well. And my
first reaction was: that cannot be, everybody knows , how can it
_not_ be XML?
The thing is, XML is a very generic and expandable language, and
entities is one thing that can be expanded upon (above the five that are
always allowed: < > &, &apos and "). This is done by
declaring entities in DTD declarations like Patrick suggested, or can be
done by using an external DTD file and link to it.
If your input comes from XHTML or HTML, this happens often. The fix is
to use the original doctype declaration and make sure that the DTD's it
refers to are available. That way other entities like —, ¨
© are also recognized in the majority of cases.
You can find the declaration of all these entities here:
http://www.w3.org/TR/xhtml1/dtds.html#a_dtd_Latin-1_characters, it also
shows a typical declaration for use in XML. Download the file at
http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent, use it locally to refer
to it and you can work with almost all XHTML/HTML input, as long as the
rest is well-formed.
Kind regards,
Abel Braaksma
------------------------------------------------------------------------
From: Michael Kay <mike(_at_)saxonica(_dot_)com>
Sent: Wednesday, August 10, 2011 10:19:17 AM
To: xsl-list
Cc:
Subject: Re: [xsl] nbsp fails transformation
Now since i can't even transform those files i can't throw those
entities out.
How do i handle this !?
If someone sends you a document that isn't well-formed XML, the best
strategy is to get the people who produced it to mend their ways. Once
you start accepting bad XML (or non-XML, as I prefer to call it), all
the benefits of using XML for interchange quickly become lost, and you
might as well revert to using some proprietary interchange format.
Michael Kay
Saxonica
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--