If you cannot know beforehand what entities will be passed on, it is
generally best to find online some big DTD that contains all allowed
HTML entities (you mentioned , but perhaps your favorite editor
throws in ü, ¢, ì etc). You can get quite a definite
list from the W3C of course, here's a starting point:
http://www.w3.org/TR/REC-html40/sgml/entities.html#h-24.2.1
Alternatively (but *not* recommended!): if you use Saxon 8.9 you can
read the document with unparsed-text(), do a replace on the entities
manually (replacing them by their numeric equivalents) and reparse using
saxon:parse. But, I'd vote against this as it goes against the idea of
using the XML input the way you should. Using a catalog as David
suggested is probably easier.
Finally: you ask for alternatives for an editor. I've done Tiny MCE but
I did not like it (lack of standards support). I now use FCKEditor and
it rocks (biased opinion!). A perfect (yet recent) addition to the
configuration is that you can force it to output real XHTML 1.1 and you
can have it replace all named entities for numeric ones. A definite
partner when you need additional processing. The editor works with all
major browsers (including safari, opera, konqeror) and is open source.
Cheers,
-- Abel Braaksma
Nick Shepherd wrote:
I use XSL in a homegrown content management system I wrote in php for
the templating system. One of the problems I have encountered before
was the use of entities like " " and the such. When not wrapped
in CDATA tags it would always give an error unless it was replaced
with the numeric equivalent. This data not being wrapped in CDATA is
imperative because it is being used to output html that has been
inputed from a textarea to the screen using xsl:copy-of...
Now to my question, although basic compared to the questions generally
asked on this list, is there any way to prevent these entities from
producing errors? We've come to a point with one of our products that
allows users to create their own websites and this type of
functionality is needed because the rich text editor of choice loves
to throw " " everywhere (tiny mce). Any ideas? Or alternatives
to the rich text editor that would allow non-techy users to edit the
look and feel of their content on their sites?
Nick Shepherd
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--