xsl-list
[Top] [All Lists]

Re: resolve html entities

2005-10-31 02:42:44
Thanks for the suggestion.

I hoped this would be easier, since these are all "standard" html entities. I thought of two possible
approaches:

1.) the ugly one:  do a string-replace

2.) get a fitting dtd/schema which maps these entities to unicode characters

Would either one be a good starting point?

Thanks,
Max



Michael Kay wrote:

I would suggest parsing the HTML using John Cowan's TagSoup parser. This
looks to the XSLT processor just like an XML parser, so you can probably
integrate it directly - depending on the XSLT processor that you are using.

Michael Kay
http://www.saxonica.com/
-----Original Message-----
From: Maximilian Gärber [mailto:max(_at_)gaerber(_dot_)de] Sent: 31 October 2005 08:40
To: xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com
Subject: [xsl] resolve html entities

Hi,

I know this is a common question but I could not find a specific answer to this:

I am exporting texts from a database that contains html markup. Now I need to transform
the html to something usable in a DTP application.

The tags are not the problem because I am only allowing a subset of html but the html entities (german umlauts, special characters) would need to be transformed to plain Unicode (UTF-8)
characters.

What is the best way to achieve this?

Thanks,

Max Gaerber

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--





--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--



<Prev in Thread] Current Thread [Next in Thread>