perl-unicode

Re: Some Encode::TW test results.

2002-02-25 17:07:43
On Tue, Feb 19, 2002 at 11:57:11AM +0900, Dan Kogai wrote:
  Other major codings that are missing is obviously CNS11643.  I don't 
know much about it but so far as I know CNS11643 is ISO-2022 compliant 
and CNS11643-1 and CNS11643-2 covers Big5.

Hmm?  It occurs to me that
http://archive.develooper.com/perl5-porters(_at_)perl(_dot_)org/msg60957.html

already has their map provided by SADAHIRO Tomoyuki; although the CNS tools in
Taiwan are proprietary, so I'll not be able to verify its accuracy beyond the
unicode.org reference map.

  But as you see Encode:XX is so far dependent on Tcl encoding and there 
is no CNS11643 there yet....

So, are there known problems with using the above maps?

Anyway, I'll get some more tests (and get GB working) when I wake up.
  It does!  Now we are looking for testers of KR and CN as well.  Anyone?

GB2312(CN) is absolutely broken; it rejects any valid GB input I could muster
(including EUC-CN, HZ and GBK encodings); I suppose the original Tcl map
is broken as well, since it lists itself as a type D (double-byte) mapping,
but in practice it's almost always a M type encoding, with 0xA1-0xFE as 'lead'
bytes. GB12345 is similarily broken.

I'll see what I can do to regenerate their maps, either from 
http://www.unicode.org/Public/MAPPINGS/ or other official sources.

Thanks,
/Autrijus/

Attachment: pgpIsrIyvsLap.pgp
Description: PGP signature

<Prev in Thread] Current Thread [Next in Thread>