perl-i18n

Re: GB2312 Encoding and File Names

2011-11-22 02:28:13
Resolved by encoding to UTF-8 once the decoding had been completed:

my $fname = encode('utf8', decode('MIME-EncWords', 
$head->recommended_filename));
-- 
Thanks, Phil

----- Original Message -----
Through some help of the PerlMonks board I have decoded the file name
correctly; but when you dump it does not match the physical file
name as it is stored within the file system ie.

MIME Header :
=?gb2312?B?RFBNMjAwN2V4Y2hhbmdl64rgXcVj4F3P5NDej80uemlw?=
Decoded     : DPM2007exchange電郵與郵箱修復.zip
$VAR1 =
"DPM2007exchange\x{96fb}\x{90f5}\x{8207}\x{90f5}\x{7bb1}\x{4fee}\x{5fa9}.zip";

so when one tries to compare to what is read from a directory listing
you cannot match them together :( How do I get the decoded name to
be as it is meant to be; as show above.
--
Thanks, Phil

----- Original Message -----
Just a follow up for some help on this problem. I appear to be able
to decode Simplified Chinese okay but Tradional Chinese is somewhat
more difficult.  I have the file name MIME entity:

=?gb2312?B?MzYw0MLOxbzgsuItMTItMDEtQ2hpIFNpbXAudHh0?=

which should decode to:

DPM2007exchange電郵與郵箱修復.zip

but when I try and decode that name in Perl it comes out as:

DPM2007exchange���]�c�]箱修��.zip

I have installed the Encode::HanExtra module but even with that it
is
still not showing correctly. Am I missing some other type of module
?
--
Thanks, Phil

----- Original Message -----
Hello all,

I do hope I am in the right place for some help! I am working on
a
project that requires email attachments to be extracted to the
file
system. All was working great until one of our kind testers tried
with normal and simplified Chinese; where I ended up with files
of
the name ?????.txt.

Am using the module MIME::Parser to extract the files and after
some
great help from the developer I have realized that one need to
override a method in MIME::Parser::Filer so that the correct file
names are generated.

One of the attachments in the test email is show below:

360新闻监测-12-01-Chi Simp.txt

I have tried to use MIME::EncWords and MIME::Charset to extract
the
correct name from the MIME entity using:

my $fname = decode_mimewords($head->recommended_filename);

but this still does not work :( so I tried to compare what the
file
name looks like with the LANG with/and without UTF8

With LANG en_GB.UTF8

360新闻监测-12-01-Chi Simp.txt

With LANG en_GB

360�?��?��??��?-12-01-Chi Simp.txt

Now this is what happens when I extract the file with my new
method:

With LANG en_GB

360���ż���-12-01-Chi Simp.txt

With LANG en_GB.UTF8

360???ż???-12-01-Chi Simp.txt

The MIME file name appears as
?gb2312?B?MzYw0MLChLFPnHktMTItMDEtQ2hpIFRyYWQudHh0?=

This is not may area of expertise so reaching out to you for some
help. How can one extract the file name from an email and have it
reflect its really Chinese name ?  Hope this make sense!
--
Thanks, Phil




<Prev in Thread] Current Thread [Next in Thread>