mhonarc-dev

[bug #35723] Chinese GB2312 character set is missing the newer GB18030 characters

2012-03-05 05:08:23
URL:
  <http://savannah.nongnu.org/bugs/?35723>

                 Summary: Chinese GB2312 character set is missing the newer
GB18030 characters
                 Project: MHonArc
            Submitted by: ssb22
            Submitted on: Mon 05 Mar 2012 11:07:59 GMT
                Category: Character Sets
                Severity: 3 - Normal
                Priority: 5 - Normal
              Item Group: None
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
        Operating System: All
            Perl Version: 5.10.0
       Component Version: 1.3
           Fixed Release: 

    _______________________________________________________

Details:

Many Chinese systems send mail with a charset label of GB2312 but the actual
charset is GBK or GB18030, which are both supersets of GB2312.  The GB2312
label is designed to trick older Chinese software into displaying as many
characters as it can and ignoring the rest, but newer Chinese software treats
a "GB2312" label as GB18030.  However Mhonarc's
lib/perl5/site_perl/5.10.0/MHonArc/UTF8/GB2312.pm and
lib/perl5/site_perl/5.10.0/MHonArc/CharEnt/GB2312.pm cover only the original
GB2312 standard, not the full GB18030.

Attached is a Python script to generate the relevant files for GB18030 and
call it GB2312.  Additionally it would be nice if Mhonarc would recognise a
"charset=GBK" or "charset=GB18030" header (search for gb2312 in Char.pm,
CharEnt.pm and UTF8/MhaEncode.pm and add the other mappings).

It seems MHonArc sometimes bypasses these tables, so I also put

    if ($from_enc eq 'gb2312') {
            $from_enc = 'gb18030';
    }

into MHonArc/Encode.pm's _encode_from_to and _unimap_from_to, and

    if ($charset eq 'gb2312') {
            $charset = 'gb18030';
    }

into UTF8/Encode.pm.



    _______________________________________________________

File Attachments:


-------------------------------------------------------
Date: Mon 05 Mar 2012 11:07:59 GMT  Name: makeGB.py  Size: 2kB   By: ssb22

<http://savannah.nongnu.org/bugs/download.php?file_id=25266>

    _______________________________________________________

Reply to this item at:

  <http://savannah.nongnu.org/bugs/?35723>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.nongnu.org/

---------------------------------------------------------------------
To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the
message text UNSUBSCRIBE MHONARC-DEV

<Prev in Thread] Current Thread [Next in Thread>
  • [bug #35723] Chinese GB2312 character set is missing the newer GB18030 characters, Silas S. Brown <=