Geert,
this is interesting to know.
What do you mean by "patch"?
Do you mean perhaps that I should write something that strips out the 3
bytes from the beginning of the file?
Jup, I meant writing (Java?) code. But not in the sense of writing a separate app that bites ;) the
three bytes off the head of the document, but merely adjusting the reading process in existing code.
Provided you have access to it. When using the XSL parser in a larger framework (Cocoon perhaps),
you can often do this fairly easy. When using the XSL parser from the command-line, typically not.
I think that the easiest solution is to ask the people who deliver this
file to switch to ISO-8859-1 as there is no real need to use unicode for
these files, I mean, there is not going to be any text containing exotic
characters in there.
Jup, that is in line with my second suggestion. But perhaps they can use a different creation tool.
This problem is most heared when people are editing XML documents with a text editor.
I am bound to use this xsl processor for the simple reason that it's the
best of the bunch from a performance standpoint (thanks Micheal Kay!).
I've been struggling for days with Altova XSLT 2005 engine and Oracle's
internal processor and it was a nightmare.
I had a file of 32Mb xml file that took *hours* to be processed with
these two processors until I tried out saxon that cruched it in less
than one minute!
Nice..
So, as you can easily guess, I am not going to willingly dump Saxon just
for those three funny bytes.
No, you shouldn't. But perhaps someone knows a way to configure Saxon such that it uses a different
XML parser front end?
Hey Micheal, what do you think about this?
Is there any hope that xerces will "consume" this utf-8 marker in the
near future?
Cheers..
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--