Abel Braaksma wrote:
One attribute on xsl:output causes problems always, as far as I could
tell, which is the following:
* byte-order-mark
When you use it together with UTF-8 it will offset the amount by one.
This is because the byte order mark (xFEFF), when interpreted as a
string, will be translated into the equivalent string representation
in UTF-8, which is the byte sequence xEFBBBF, now representing the
codepoint 65279 (U+FEFF) (Zero Width No Break Space, deprecated but
allowed). This interpretation is in lieu of the Unicode
recommendation. It is useless to put a BOM at the beginning of a UTF-8
stream, so it is best to avoid it.
Oh, I must be sleeping. The analysis above is correct, but the amount
"offset by one" is also correct. A UTF-8 bytestream will never start
with the bytes xFFEF or xFEFF. When the BOM is present in UTF-8, it is
(and must be) encoded as xEFBBBF, meaning: the UTF-8 representation of
U+FEFF. Ergo: the total amount (plus three for the BOM) is correct.
Ergo: there are no mistakes in calculation when using the mentioned
approach.
Sorry for cluttering this thread...
Cheers,
-- Abel Braaksma
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--