----- Original Message -----
From: "Geert Josten" <Geert(_dot_)Josten(_at_)daidalos(_dot_)nl>
To: <xsl-list(_at_)lists(_dot_)mulberrytech(_dot_)com>
Sent: Wednesday, December 07, 2005 7:22 PM
Subject: Re: [xsl] Encoding problem or what else?
Hi Flavio,
I expected this from your first post. The three bytes are the (optional)
UTF-8 Byte Order Mark (BOM). The XML Parser that is used by your XSL
processor does not consume them as it should, resulting in character data
in the prolog, which is obviously not allowed.
It is typical of Microsoft products to use this BOM. Wordpad adds it at
save time and consumes it at reading time, so you will never see it in
that editor. Switch to a different (XML) parser, get rid of the BOM in
your data (can you influence the creation?) or patch the reading process
to consume this BOM.
Second option is perhaps easiest.
Regards,
Geert
Geert,
this is interesting to know.
What do you mean by "patch"?
Do you mean perhaps that I should write something that strips out the 3
bytes from the beginning of the file?
I think that the easiest solution is to ask the people who deliver this file
to switch to ISO-8859-1 as there is no real need to use unicode for these
files, I mean, there is not going to be any text containing exotic
characters in there.
I am bound to use this xsl processor for the simple reason that it's the
best of the bunch from a performance standpoint (thanks Micheal Kay!).
I've been struggling for days with Altova XSLT 2005 engine and Oracle's
internal processor and it was a nightmare.
I had a file of 32Mb xml file that took *hours* to be processed with these
two processors until I tried out saxon that cruched it in less than one
minute!
So, as you can easily guess, I am not going to willingly dump Saxon just for
those three funny bytes.
Hey Micheal, what do you think about this?
Is there any hope that xerces will "consume" this utf-8 marker in the near
future?
Bye,
Flavio
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--