I came upon this particular named subject by doing a web
search. The person that began the thread was having difficulty within
java converting strange characters into normal character entities.
I'm doing something similar in that I'm reading a text file generated
by MS Word on a Macintosh and I'd like to automatically change the
weird characters using Java.
The method I'm using to do this is by making an XML configuration
file that contains information on what characters to change, such as:
<pair from="Ò" to="&lsquo;"/>
That is, if the program finds the data value 0xD2 in the input stream,
it should notice this and replace it with ‘ which it did until I
upgraded to j2sdk1.4.1. Now, after parsing the configuration file, the
DOM parser reports that 0xD2 *isn't* 0xD2 but rather is ? (0x3F).
In a message by Mike Brown, at
this same particular thread, there is mention of escaping the attriute
You must always escape the attribute values. You can get
around the need to escape character data content of an
element by using CDATA sections, but I think you'll find
that it's actually just as easy to escape
everything. Entities aren't going to help you.
But is that not what I'm doing above? Or should I make it:
<pair from="&#xd2;" to="&lsquo;"/>
or is there some method to tell Java to not try and interpret 0xD2 but
just accept it?
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list