perl-unicode

Fwd: GB 1833 name abiguous

2002-03-20 07:56:36
This is a forwarded message
From: Anton Tagunov <tagunov(_at_)motor(_dot_)ru>
To: Nick Ing-Simmons <nick(_dot_)ing-simmons(_at_)elixent(_dot_)com>
Date: Tuesday, March 19, 2002, 7:12:26 PM
Subject: ISO-8859-1 vs ISO 8859-1 (typo + UTF8 case too :)

===8<==============Original message text===============
Hello Nick! Hello, all!

Confessed - I'm a complete MORON. Dot.

Yes, GB2312 is an encoding.
I have finally read rfc1345. Should have done it earlier.

AT> GB 2312
AT> is GB 2312 valid as a parameter to Encode::encode?

NIS> We have an gb2312.enc
NIS> FWIW (and worth has to be -ve)

? what is -ve ? :-)

NIS> I get a daily pile of SPAM with Subjects
NIS> Subject: =?GB2312?B?0LvQu8Tjo6E=?=
NIS> So something thinks it is an encoding.

It is :-)

But I see we're still in a trouble:

CN.pm:--------------------------------------------------------------

  gb2312        The raw (low-bit) GB2312 character map
  
IANA registry:------------------------------------------------------

  Name: GB_2312-80                                    [RFC1345,KXS2]
  MIBenum: 57
  Source: ECMA registry
  Alias: iso-ir-58

  Name: GB2312  (preferred MIME name)
  MIBenum: 2025
  Source: Chinese for People's Republic of China (PRC) mixed one byte,
        two byte set: 
          20-7E = one byte ASCII 
          A1-FE = two byte PRC Kanji 

rfc1345:------------------------------------------------------------
  &charset GB_2312-80
  &rem source: ECMA registry
  &alias iso-ir-58
  &bits 16
  &code2 1 1

What this means is: IANA registry has two entries
- Name: GB_2312-80                                    [RFC1345,KXS2]
    corresponds to rfc1345's iso-ir-58 and denotes
    7-bit two-bytes per char Chinese encoding (each byte ranges
    0x21-0x7E). According to docs in CN.pm this is what we have
- Name: GB2312  (preferred MIME name)
    denotes 8-bit encoding that has ASCII in
    0x20-0x7E and that encodes Chinese (the same chars) in
    two-bytes encoding, each byte ranging 0xA1-0xFE

What we have is an ambigeous name. If we were meaning the 8 bit
encoding (that has a MIME name) it should have been GB2312, not
GB 2312.
But we have the 7-bit encoding. What name should it have not
to be mistaken for the other one?

- Anton

===8<===========End of original message text===========



-- 
Best regards,
 Anton                            mailto:tagunov(_at_)motor(_dot_)ru


<Prev in Thread] Current Thread [Next in Thread>