xsl-list
[Top] [All Lists]

Re: [xsl] Correcting unbound namespace prefixes

2010-08-02 11:21:41
Tony Nassar wrote:
I'm not sure this is the correct place to post. This may be a question about JAXP, or simply about good standard operating procedure for bad input data.
I've got some XML that I know is invalid, but I'm not in a position to get the 
customer to fix it. Here's what it looks like:

The term "valid" is used to express validity against a DTD or against schemas. That markup is not namespace well-formed.

<document>
   <text>Four score and twenty years ago..,</text>
   <pp:metadata publication-date="2010-07-31T12:30:00Z" />
  ...

You get the idea (I hope): clearly someone began with XML in the "" namespace, extracted 
metadata in a post-processing step, and inserted the corresponding markup without adding the 
necessary namespace declarations or mapping "pp" to one. I don't know of a way to fix 
this through the JAXP API (i.e. interpolating the prefix mapping). Or am I better off just 
preprocessing this XML via Perl or Python before it's ever parsed?

You can't parse that successfully with any namespace aware parser as that is required to throw an error on the 'pp:metadata' element name. And XSLT/XPath operate on a data model that is usually created by parsing with a namespace aware parser so I don't think XSLT and this can help.

I think JAXP however allows you to create non namespace aware SAX or DOM parsers (e.g. http://download-llnw.oracle.com/javase/6/docs/api/javax/xml/parsers/SAXParserFactory.html#isNamespaceAware()) and that way you should at least be able to parse that markup without an error, you will get element names containing colons that way and need to find a way to create namespace well-formed markup instead. Not something I am familiar with and not really on topic here.



--

        Martin Honnen
        http://msmvps.com/blogs/martin_honnen/

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

<Prev in Thread] Current Thread [Next in Thread>