Re: encoding and XSL Transformation

Chuck White wrote at 10 Sep 2002 07:19:37 -0700:

Windows encodings within the range of 128-159 map out to a variety of
control characters in Unicode, so your problem begins with your source
document, not Xalan.


Don't automatically equate byte values with character numbers (i.e.,
code points).

Bytes in the range 128-159 when read as, say, ISO-8859-1 maps to a
variety of control characters.

Data in ISO-8859-1 when read as UTF-8 maps to a lot of junk, usually
with a lot of illegal byte sequences.  UTF-8 data read as UTF-16
undoubtedly reads as a lot of junk too.

Data in a Windows code page when read as a Windows code page (in an
XML context, when the encoding declaration specifies the right
encoding) reads as a variety of characters that have Unicode code
points that do not have a 1:1 correspondence with the numeric value of
the bytes used to represent the characters.

Regards,


Tony Graham
------------------------------------------------------------------------
XML Technology Center - Dublin                
mailto:tony(_dot_)graham(_at_)sun(_dot_)com
Sun Microsystems Ireland Ltd                       Phone: +353 1 8199708
Hamilton House, East Point Business Park, Dublin 3            x(70)19708

 XSL-List info and archive:  http://www.mulberrytech.com/xsl/xsl-list

Previous by Date:	string manipulation, Jiang, Peiyun
Next by Date:	RE: string manipulation, James Fuller
Previous by Thread:	Re: encoding and XSL Transformation, David Carlisle
Next by Thread:	RE: encoding and XSL Transformation, Michael Kay
Indexes:	[Date] [Thread] [Top] [All Lists]