David Carlisle wrote:
Unfortunately, that says it all. Control characters are not allowed in
UTF-8 and as a result, are not allowed in XML, when the encoding is
UTF-8 (making XML not well-formed)
Not so, utf8 can encode control characters, but they are not allowed in
XML 1.0 (whatever the encoding)
David
Colin Adams wrote:
Unfortunately, that says it all. Control characters are not allowed
in UTF-8 and as a result,
Oh yes they are!
You are all so alert! Like I said to Florent earlier today: I shouldn't
post too late anymore. Yet, reading these posts, I had to look it up to
find out the details, just of curiosity. From Unicode Standard 4.0 (I
know, XML requires at least v3.1), it says in chapter 15.1, and I quote:
"There are 65 code points set aside in the Unicode Standard for
compatibility with the C0 and C1 control codes [....] U+0000 - U+001F,
U+007F, U+0080 - U+009F."
Reading on reveals that when you use UTF-8, they will be represented as
their hexadecimal value <03> for x03 etc, padded with one NUL for UTF-16
and thre NULs in UTF-32. Meaning that the hexadecimal appearance of x08
indeed is legal in UTF-8 (note that for the higher range, UTF-8 will
encode to a two-byte sequence).
Thanks for pointing me to this.
Cheers,
-- Abel Braaksma
http://abelleba.metacarpus.com
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--