Chuck White wrote at 10 Sep 2002 07:19:37 -0700:
Windows encodings within the range of 128-159 map out to a variety of
control characters in Unicode, so your problem begins with your source
document, not Xalan.
Don't automatically equate byte values with character numbers (i.e.,
code points).
Bytes in the range 128-159 when read as, say, ISO-8859-1 maps to a
variety of control characters.
Data in ISO-8859-1 when read as UTF-8 maps to a lot of junk, usually
with a lot of illegal byte sequences. UTF-8 data read as UTF-16
undoubtedly reads as a lot of junk too.
Data in a Windows code page when read as a Windows code page (in an
XML context, when the encoding declaration specifies the right
encoding) reads as a variety of characters that have Unicode code
points that do not have a 1:1 correspondence with the numeric value of
the bytes used to represent the characters.
Regards,
Tony Graham
------------------------------------------------------------------------
XML Technology Center - Dublin
mailto:tony(_dot_)graham(_at_)sun(_dot_)com
Sun Microsystems Ireland Ltd Phone: +353 1 8199708
Hamilton House, East Point Business Park, Dublin 3 x(70)19708
XSL-List info and archive: http://www.mulberrytech.com/xsl/xsl-list