xsl-list
[Top] [All Lists]

Re: [xsl] fault tolerant saxon:parse()

2008-11-17 06:59:01
2008/11/17 David Carlisle <davidc(_at_)nag(_dot_)co(_dot_)uk>:

I'm wondering if there's a standard approach for a fault tolerant
saxon:parse()   (or alternative equivalent)

personally I've used tagsoup and htmplparse.xsl, but parhaps the nearest
to a standard these days is http://about.validator.nu/ which implements
the HTML5 parsing algorithm in Java and exposes (so I'm told) sax and
DOM interfaces as if it were reading XML.

Thanks, but I'm looking more for a way of detecting when it's needed...

For example, in the nasty RSS feed for Transport for London's live
travel updates you can have:

<title> &lt;a 
href="/tfl/livetravelnews/realtime/tube/default.html"&gt;Today&lt;/a&gt;
</title>

and:
                
<title>Hammersmith &amp; City</title>

The former needs parsing if you want to process the escaped markup,
but if you do that with the latter you get an error (because it thinks
the ampersand is the start of an entity) - its the same element, so
both escaped and non-escaped markup needs to be handled.

Maybe saxon:try / catch is the only option here...?


-- 
Andrew Welch
http://andrewjwelch.com
Kernow: http://kernowforsaxon.sf.net/

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--