perl-unicode

Re: Some Encode::TW test results.

2002-02-20 06:29:13
On Wed, Feb 20, 2002 at 07:52:19AM +0000, Nick Ing-Simmons wrote:
If you - as perl's Big5 expert - say that that is the one to go with that
is good enough for me.

Alright. I think we could just use Big5+'s First Standard Segment group
as the map here -- It's superior than either Tcl or iconv's map at several
places, and afaik has no obvious shortcomings.

"compile" can take two forms - Tcl's .enc files which are packed UCS2
values - and ICU's .ucm files which are human readable and commentable
text files. (Compile can also convert between the two.)

http://autrijus.org/big5.enc.bz2 is the massaged map. The only adjustments
I made is to allow 00A0 and 00FA..00FF to retain their meaning, instead of
ruling them as 'unmapped' characters. The Big5+ spec is undefined in this
point, and makes conversion of legacy documents slightly easier.

Also, Encode.pm seems unable to handle '00xy' in the map, where 'x' has its
highest bit set. There are six such places:

Big5 UCS2 Charname
-----------------------------
A150 00B7 MIDDLE DOT
A1B1 00A7 SECTION SIGN
A1D1 00D7 MULTIPLICATION SIGN
A1D2 00F7 DIVISION SIGN
A1D3 00B1 PLUS-MINUS SIGN
A258 00B0 DEGREE SIGN

For example, decode('big5', "\xA1\x50") simply equals to "\xB7", instead
of the required "\xC2\xB7" UTF-8 expansion form. Can this be fixed?

/Autrijus/

Attachment: pgpObQqvl3Qec.pgp
Description: PGP signature

<Prev in Thread] Current Thread [Next in Thread>