perl-unicode

Re: Fwd: GB 1833 name abiguous

2002-03-20 17:05:21
On Wed, Mar 20, 2002 at 05:50:42PM +0300, Anton Tagunov wrote:
What we have is an ambigeous name. If we were meaning the 8 bit
encoding (that has a MIME name) it should have been GB2312, not
GB 2312.

This has been raised before, at:
http://archive.develooper.com/perl-unicode(_at_)perl(_dot_)org/msg00819.html

But we have the 7-bit encoding. What name should it have not
to be mistaken for the other one?

....in which I proposed to rename gb2312 to gb2312-raw to avoid the
ambiguity. The *other* 8-bit MIME encoding is euc-cn.

And yes, this is in disagreement to iconv and hc's conventions, in
which gb2312 is an alias to euc-cn, and the raw gb2312 is not
directly accessible:

* EUC-CN = GB2312
    We implement this because it is the widely used representation
    of simplified Chinese.

Thus, the "=?GB2312?B?0LvQu8Tjo6E=?=" spam received by NI-S is
not encoded in perl's GB2312, but is "Thank you!" in EUC-CN.

Executive summary: Encode.pm isn't just for transport use, so we
have a namespace clash; neither GB2312 is 'more right' than the
other interpretation. But as the main use of Encode.pm would be
(imho) in IO disciplines and "use encoding;", I'd suggest:

  - Retain the file gb2312.enc.
  - Alias /gb-?2312/i to 'euc-cn'.
  - Make a 'gb2312-raw' to point to 'gb2312.enc'.
  
Makes sense?

/Autrijus/

Attachment: pgp20jujZYlNp.pgp
Description: PGP signature

<Prev in Thread] Current Thread [Next in Thread>