One last question:
As the html is well formed xml (xhtml), can you point me to a resource
(dtd) to start with?
Thanks,
Max
David Carlisle wrote:
2.) get a fitting dtd/schema which maps these entities to unicode characters
Would either one be a good starting point?
It would have to be a dtd (schema's don't do entity definitions) This is
the "standard" way of doing this so long as the "html" you are getting
is well formed xml. But most html isn't even valid html never mind being
well formed, in which case, as Michael said, using tag soup is a better
option as it is designed to forgive at places where a browser would
forgive (but an xml parser would give a fatal error)..
David
________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--