This is a forwarded message
From: Anton Tagunov <tagunov(_at_)motor(_dot_)ru>
To: Nick Ing-Simmons <nick(_dot_)ing-simmons(_at_)elixent(_dot_)com>
Date: Tuesday, March 19, 2002, 7:12:26 PM
Subject: ISO-8859-1 vs ISO 8859-1 (typo + UTF8 case too :)
===8<==============Original message text===============
Hello Nick! Hello, all!
Confessed - I'm a complete MORON. Dot.
Yes, GB2312 is an encoding.
I have finally read rfc1345. Should have done it earlier.
AT> GB 2312
AT> is GB 2312 valid as a parameter to Encode::encode?
NIS> We have an gb2312.enc
NIS> FWIW (and worth has to be -ve)
? what is -ve ? :-)
NIS> I get a daily pile of SPAM with Subjects
NIS> Subject: =?GB2312?B?0LvQu8Tjo6E=?=
NIS> So something thinks it is an encoding.
It is :-)
But I see we're still in a trouble:
CN.pm:--------------------------------------------------------------
gb2312 The raw (low-bit) GB2312 character map
IANA registry:------------------------------------------------------
Name: GB_2312-80 [RFC1345,KXS2]
MIBenum: 57
Source: ECMA registry
Alias: iso-ir-58
Name: GB2312 (preferred MIME name)
MIBenum: 2025
Source: Chinese for People's Republic of China (PRC) mixed one byte,
two byte set:
20-7E = one byte ASCII
A1-FE = two byte PRC Kanji
rfc1345:------------------------------------------------------------
&charset GB_2312-80
&rem source: ECMA registry
&alias iso-ir-58
&bits 16
&code2 1 1
What this means is: IANA registry has two entries
- Name: GB_2312-80 [RFC1345,KXS2]
corresponds to rfc1345's iso-ir-58 and denotes
7-bit two-bytes per char Chinese encoding (each byte ranges
0x21-0x7E). According to docs in CN.pm this is what we have
- Name: GB2312 (preferred MIME name)
denotes 8-bit encoding that has ASCII in
0x20-0x7E and that encodes Chinese (the same chars) in
two-bytes encoding, each byte ranging 0xA1-0xFE
What we have is an ambigeous name. If we were meaning the 8 bit
encoding (that has a MIME name) it should have been GB2312, not
GB 2312.
But we have the 7-bit encoding. What name should it have not
to be mistaken for the other one?
- Anton
===8<===========End of original message text===========
--
Best regards,
Anton mailto:tagunov(_at_)motor(_dot_)ru