xsl-list
[Top] [All Lists]

Re: [xsl] Safe-guarding codepoints-to-string() from wrong input

2006-12-20 08:08:32
On 12/20/06, Abel Braaksma <abel(_dot_)online(_at_)xs4all(_dot_)nl> wrote:
I know that control characters are not allowed and throw an "Invalid XML
character" error.

If you are receiving strings containing literal control characters
then they're almost definitely encoded in Windows-1252 - just parse
them using that and you'll be ok.

If the string contains control characters as character references,
then its a bit harder because the references get expanded using
unicode codepoints, and not those specified in the Windows-1252
mappings...  So you need to parse/serialize the string to expand the
references (I personally use JTidy with the CharEncoding set to
Configuration.RAW which forces the Tidy to output the bytes instead of
a reference)

Its a pain....

cheers
andrew

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--