Tony Nassar wrote:
I'm not sure this is the correct place to post. This may be a question about JAXP, or simply about good standard operating procedure for bad input data.
I've got some XML that I know is invalid, but I'm not in a position to get the
customer to fix it. Here's what it looks like:
The term "valid" is used to express validity against a DTD or against
schemas. That markup is not namespace well-formed.
<document>
<text>Four score and twenty years ago..,</text>
<pp:metadata publication-date="2010-07-31T12:30:00Z" />
...
You get the idea (I hope): clearly someone began with XML in the "" namespace, extracted
metadata in a post-processing step, and inserted the corresponding markup without adding the
necessary namespace declarations or mapping "pp" to one. I don't know of a way to fix
this through the JAXP API (i.e. interpolating the prefix mapping). Or am I better off just
preprocessing this XML via Perl or Python before it's ever parsed?
You can't parse that successfully with any namespace aware parser as
that is required to throw an error on the 'pp:metadata' element name.
And XSLT/XPath operate on a data model that is usually created by
parsing with a namespace aware parser so I don't think XSLT and this can
help.
I think JAXP however allows you to create non namespace aware SAX or DOM
parsers (e.g.
http://download-llnw.oracle.com/javase/6/docs/api/javax/xml/parsers/SAXParserFactory.html#isNamespaceAware())
and that way you should at least be able to parse that markup without an
error, you will get element names containing colons that way and need to
find a way to create namespace well-formed markup instead. Not something
I am familiar with and not really on topic here.
--
Martin Honnen
http://msmvps.com/blogs/martin_honnen/
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--