Autrijus Tang <autrijus(_at_)autrijus(_dot_)org> writes:
Also, Encode.pm seems unable to handle '00xy' in the map, where 'x' has its
highest bit set. There are six such places:
Big5 UCS2 Charname
-----------------------------
A150 00B7 MIDDLE DOT
A1B1 00A7 SECTION SIGN
A1D1 00D7 MULTIPLICATION SIGN
A1D2 00F7 DIVISION SIGN
A1D3 00B1 PLUS-MINUS SIGN
A258 00B0 DEGREE SIGN
For example, decode('big5', "\xA1\x50") simply equals to "\xB7", instead
of the required "\xC2\xB7" UTF-8 expansion form. Can this be fixed?
What you see in perl is the Unicode code point number _NOT_ the UTF-8
encoding. If you want UTF-8 octet sequence you need to encode('UTF-8',...)
(or one of the short cuts for that).
--
Nick Ing-Simmons
http://www.ni-s.u-net.com/