URL:
<http://savannah.nongnu.org/bugs/?35723>
Summary: Chinese GB2312 character set is missing the newer
GB18030 characters
Project: MHonArc
Submitted by: ssb22
Submitted on: Mon 05 Mar 2012 11:07:59 GMT
Category: Character Sets
Severity: 3 - Normal
Priority: 5 - Normal
Item Group: None
Status: None
Privacy: Public
Assigned to: None
Open/Closed: Open
Discussion Lock: Any
Operating System: All
Perl Version: 5.10.0
Component Version: 1.3
Fixed Release:
_______________________________________________________
Details:
Many Chinese systems send mail with a charset label of GB2312 but the actual
charset is GBK or GB18030, which are both supersets of GB2312. The GB2312
label is designed to trick older Chinese software into displaying as many
characters as it can and ignoring the rest, but newer Chinese software treats
a "GB2312" label as GB18030. However Mhonarc's
lib/perl5/site_perl/5.10.0/MHonArc/UTF8/GB2312.pm and
lib/perl5/site_perl/5.10.0/MHonArc/CharEnt/GB2312.pm cover only the original
GB2312 standard, not the full GB18030.
Attached is a Python script to generate the relevant files for GB18030 and
call it GB2312. Additionally it would be nice if Mhonarc would recognise a
"charset=GBK" or "charset=GB18030" header (search for gb2312 in Char.pm,
CharEnt.pm and UTF8/MhaEncode.pm and add the other mappings).
It seems MHonArc sometimes bypasses these tables, so I also put
if ($from_enc eq 'gb2312') {
$from_enc = 'gb18030';
}
into MHonArc/Encode.pm's _encode_from_to and _unimap_from_to, and
if ($charset eq 'gb2312') {
$charset = 'gb18030';
}
into UTF8/Encode.pm.
_______________________________________________________
File Attachments:
-------------------------------------------------------
Date: Mon 05 Mar 2012 11:07:59 GMT Name: makeGB.py Size: 2kB By: ssb22
<http://savannah.nongnu.org/bugs/download.php?file_id=25266>
_______________________________________________________
Reply to this item at:
<http://savannah.nongnu.org/bugs/?35723>
_______________________________________________
Message sent via/by Savannah
http://savannah.nongnu.org/
---------------------------------------------------------------------
To sign-off this list, send email to majordomo(_at_)mhonarc(_dot_)org with the
message text UNSUBSCRIBE MHONARC-DEV