Pankaj Bishnoi wrote:
Hi Owen
The file starts with <?xml version="1.0"
encoding="ISO-8859-1"?> so i think before transforming the encoding of file
is changed to UTF-8(Default encoding for Xalan transformer) and since UTF-8
encoded file cannot contain ISO-8859-1 characters so this might be the cause
of this problem i am still debugging it.
No, UTF-8 is an encoding for Unicode, which can handle all characters
fro ISO-8859-1.
If you use Eclipse, you can test the "looks" of your file as follows:
1. Open the XML file as-is.
2. Right-click the file in the Navigator and click Properties
3. Check "Default (determined from content: ISO-8859-1)" (I mean: check
what it says there, it should show "ISO-8859-1")
4. Read through your file carefully if you see any small squares
(Eclipse's way of showing unknown chars, chars not in the font, or chars
that are illegal), if there are some, your file contains illegal encodings.
5. It may be that as the result of illegal characters, Xalan tries to
read it as UTF-8 (because that is the default for XML), but ISO-8859-1
and UTF-8 are not the same for characters above codepoint 127, and for
these characters it may give this error.
6. Go again to the Properties, and type manually "UTF-8". Check again
for any little squares.
7. Make a little change, and change the encoding string to "UTF-8".
Eclipse will automatically and correctly save it as UTF-8 now. Change it
back to ISO-8859-1. Eclipse will replace any character that is not
allowed in ISO-8859-1 with a "?" char. Close and open it to see if it
has such changed chars.
If you don't have Eclipse, you can use a text editor where you can
select and override the encoding. Even a browser will give you some
hints on illegal characters when you select another encoding using the
View menu. If you have an editor where you can search with regular
expressions, search your document with the following expression (or the
equivalent for your regex dialect):
[^\t\n\r\x20-\x79]+
it will give you all "character suspects" that may have gotten the wrong
encoding when saving the file. In fact, it gives you all characters that
are not allowed in XML when you were to encode your file as US-ASCII
(one of the most basic character sets and the first 127 codepoints are
equal to all IS0-8859-X and UTF-8 and many other character sets).
Testing all these suspects one by one (by removing/changing them), you
will quickly find the problem character.
Good luck researching!
Cheers,
-- Abel Braaksma
http://www.nuntia.nl
--~------------------------------------------------------------------
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--