Re: [xsl] Invalid byte 2 of 2-byte UTF-8 sequence exception while transf

Pankaj Bishnoi wrote:

Hi Owen
                The file starts with <?xml version="1.0"
encoding="ISO-8859-1"?> so i think before transforming the encoding of file
is changed to UTF-8(Default encoding for Xalan transformer) and since UTF-8
encoded file cannot contain ISO-8859-1 characters so this might be the cause
of this problem i am still debugging it.

No, UTF-8 is an encoding for Unicode, which can handle all charactersfro ISO-8859-1.


If you use Eclipse, you can test the "looks" of your file as follows:

1. Open the XML file as-is.
2. Right-click the file in the Navigator and click Properties

3. Check "Default (determined from content: ISO-8859-1)" (I mean: checkwhat it says there, it should show "ISO-8859-1")4. Read through your file carefully if you see any small squares(Eclipse's way of showing unknown chars, chars not in the font, or charsthat are illegal), if there are some, your file contains illegal encodings.5. It may be that as the result of illegal characters, Xalan tries toread it as UTF-8 (because that is the default for XML), but ISO-8859-1and UTF-8 are not the same for characters above codepoint 127, and forthese characters it may give this error.6. Go again to the Properties, and type manually "UTF-8". Check againfor any little squares.7. Make a little change, and change the encoding string to "UTF-8".Eclipse will automatically and correctly save it as UTF-8 now. Change itback to ISO-8859-1. Eclipse will replace any character that is notallowed in ISO-8859-1 with a "?" char. Close and open it to see if ithas such changed chars.

If you don't have Eclipse, you can use a text editor where you canselect and override the encoding. Even a browser will give you somehints on illegal characters when you select another encoding using theView menu. If you have an editor where you can search with regularexpressions, search your document with the following expression (or theequivalent for your regex dialect):


[^\t\n\r\x20-\x79]+

it will give you all "character suspects" that may have gotten the wrongencoding when saving the file. In fact, it gives you all characters thatare not allowed in XML when you were to encode your file as US-ASCII(one of the most basic character sets and the first 127 codepoints areequal to all IS0-8859-X and UTF-8 and many other character sets).Testing all these suspects one by one (by removing/changing them), youwill quickly find the problem character.


Good luck researching!

Cheers,
-- Abel Braaksma
  http://www.nuntia.nl

--~------------------------------------------------------------------
XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list
To unsubscribe, go to: http://lists.mulberrytech.com/xsl-list/
or e-mail: <mailto:xsl-list-unsubscribe(_at_)lists(_dot_)mulberrytech(_dot_)com>
--~--

Re: [xsl] Invalid byte 2 of 2-byte UTF-8 sequence exception while transforming