perl-unicode

Re[2]: gb2312 (whatever it is) refuses to encode space, \n, latin letters?

2002-03-24 02:17:00
Hello Autrijus!

Anton>     On Sun, Mar 24, 2002 at 10:32:40AM +0300, Anton Tagunov wrote:
Anton>     Maybe I'm overusing you kindness :-)

Autrijus>  nope.
:-)

Anton> perl15452 -MEncode -we "print Encode::encode('gb2312',' ')"
Anton> perl15452 -MEncode -we 'print Encode::encode('gb2312',"\n")'
Anton> perl15452 -MEncode -we 'print Encode::encode('gb2312','l')'
Anton>
Anton> all refuse to encode :-(

Autrijus> This is as expected. The 'raw' gb2312 does not contain anything but
Autrijus> internal chinese character codes, as specified at GB2312.1980-0.
Autrijus>
Autrijus> That's why it's only used in fonts, rather than in transport encoding
Autrijus> (euc-cn does that).

Hmm.. I must confess I have the most scarce knowledge about the CJK
encodings.. I have to rely on what people say on the Internet..

For instance I've been using info by

Ken Lunde's
http://www.oreilly.com/people/authors/lunde/cjk_inf.html

here's what he says about GB2312:

o Row 1: 94 symbols
o Row 2: 72 numerals
o Row 3: 94 full-width GB 1988-89 characters (see Section 2.2.1)
^^^^^^^^^^^^^^^^^^^^^^^
o Row 4: 83 hiragana
o Row 5: 86 katakana
o Row 6: 48 uppercase and lowercase Greek alphabet
o Row 7: 66 uppercase and lowercase Cyrillic (Russian) alphabet
o Row 8: 26 Pinyin and 37 Bopomofo characters
o Row 9: 76 line-drawing elements (09-04 through 09-79)
o Rows 16 through 55: 3,755 hanzi (Level 1 Hanzi; last is 55-89)
o Rows 56 through 87: 3,008 hanzi (Level 2 Hanzi; last is 87-94)

shouldn't then Row 3 contain the whole ASCII?
And if yes, what should we do about CR,LF?

With warmest regards, Anton