perl-unicode

Re: Some Encode::TW test results.

2002-02-20 07:57:03
Autrijus Tang <autrijus(_at_)autrijus(_dot_)org> writes:
Also, Encode.pm seems unable to handle '00xy' in the map, where 'x' has its
highest bit set. There are six such places:

Big5 UCS2 Charname
-----------------------------
A150 00B7 MIDDLE DOT
A1B1 00A7 SECTION SIGN
A1D1 00D7 MULTIPLICATION SIGN
A1D2 00F7 DIVISION SIGN
A1D3 00B1 PLUS-MINUS SIGN
A258 00B0 DEGREE SIGN

For example, decode('big5', "\xA1\x50") simply equals to "\xB7", instead
of the required "\xC2\xB7" UTF-8 expansion form. Can this be fixed?

What you see in perl is the Unicode code point number _NOT_ the UTF-8
encoding. If you want UTF-8 octet sequence you need to encode('UTF-8',...)
(or one of the short cuts for that).

--
Nick Ing-Simmons
http://www.ni-s.u-net.com/



<Prev in Thread] Current Thread [Next in Thread>