perl-unicode

Some Encode::TW test results.

2002-02-18 17:43:41
Being a native Big5/GB (and HKSCS, Big5+, etc) user, I'm extremely happy to
see Dan's work on Encode.pm. :-)

At jhi's bidding, I did some rudimentary test using the standard Big5 encoding
range, with iconv 2.0 as the reference point. Within the [A1-F9][40-7E,A1-FE]
range, the result was like this:

1. In the A140 - A3BF range (punctuations and phonetic symbols), iconv parsed
   without errors; Encode, however, does not agree with it in 3 places:

   * big5(A150) doesn't get mapped properly.
   * it has an off-by-one error in range big5(A15A..A17D); it mapped
     big5(A15A) as ucs2(big5(A15B)), big5(A15B) as ucs2(big5(A15C)), etc. 
   * it cannot parse the range big5(A17E..A3BF).

2. In the A440 - C67E range (widely-used characters), both iconv and Encode
   worked perfectly.

3. In the C6A1 - C8D3 range (word parts, japanese characters, and assorted
   symbols), both Encode and iconv doesn't work beyond big5(C7FC), which
   is expected.

4. In the C940 - F9D5 range (rarely used characters), both iconv and Encode
   worked perfectly.

5. In the F9D6 - F9FE range (addendum, table-drawing characters), both of them
   doesnt work, which is expected.

6. I didn't test utf8=>big5 much, but they seem to work alright.

Note that the Big5+ spec at <http://www.cmex.org.tw/download-b5.html> specified
a rather comprehensive set of official big5<=>ucs2 mappings, the relevant
part of it are available at <http://autrijus.org/big5-ucs.tar.gz>. Their format
should be self-descriptory; I wonder if it's possible to use that table to
fill in the missing codepoints, or should we add a 'big5p' encoding?

Anyway, I'll get some more tests (and get GB working) when I wake up.

Hope that helps,
/Autrijus/

Attachment: pgpvidk1gPErH.pgp
Description: PGP signature

<Prev in Thread] Current Thread [Next in Thread>