perl-unicode

Re: [PATCH] Big5-related changes.

2002-04-19 14:04:55
On Saturday, April 20, 2002, at 04:53 , Autrijus Tang wrote:
I've been immersed in Big5-related issues in the past few days, and
came back with these last-minute (err, week?) changes before 5.8-RC1.

The Diff contains fixes to TW.pm, Alias.pm, and README.(tw|cn).

Excellent!

(For dan) big5-hkscs should be upgraded to the 2001 edition, as per
Hong Kong government's decree. It's available separately at:

    http://egb.elixus.org/~autrijus/big5-hkscs.ucm.gz

Also, please delete big5.ucm and replace it with big5-eten, at:

    http://egb.elixus.org/~autrijus/big5-eten.ucm.gz

Thus updated. I needed to update TW/Makefile.PL and lib/Encode/Config.pm (so it loads on 'big5-eten' instead of just 'big5'). but that's not at all a big deal.

I've fixed Alias.pm so big5 aliases to big5-eten. The reason is that
the 'Big5' as originally defined isn't used anywhere on earth; non-
Microsoft systems uses 'big5' to mean 'big5-eten', and Microsoft
uses 'big5' to mean 'cp950'.

It is therefore unwise to have a canonical 'big5' encoding, much like
there should not be a 'gb2312' encoding. Since gb2312 is now aliased
to euc-cn and not cp936, I think big5 should alias to big5-eten and
not cp950.

I agree. AFAIK, Big5 is the only major CJK encoding not endorsed by the government. What's so funny is that there seems less confusions between encodings there in Taiwan than in Japan or Korea. Japan is the worst for using Shift_JIS, EUC-JP, ISO-2022-JP(-[12])? and now Unicode (IMHO, however, the Japanese people should be proud for making multibyte character encoding a reality. But I can't help wondering this mess is way too much a price to pay :)....

Oh, I just noticed that Dan retained the 'gb2312.ucm' name, although
the encoding is called 'gb2312-raw'. I admit that I don't fully
understand the reason, but if that's to stand, then big5-eten could also
be named 'big5.ucm', and still say '<code_set_name> "big5-eten"', for
consistency's sake.

I renamed big5.ucm to big5-eten.ucm. "-raw" that are missing from *.ucm filenames is just that they look too funny on 8.3 filesystems, nothing more :)

Thanks,
/Autrijus/

Xin     Ku      Le      !
\x{8f9b}\x{82e6}\x{4e86}

Xiao    Si       Dan
\x{5c0f}\x{98fc} \x{5f3e}\n