On Friday, April 12, 2002, at 02:30 , Nick Ing-Simmons wrote:
Having hacked RFC2047 support into tkmail I have now seen some
non-latin1 characters in a "real" perl/Tk app.
There seem to be a few snags with mime's iso-2022-jp:
- It failed to demand load given upper-case form ISO-2022-JP
What's Encode->VERSION say? Here is the current status on this one.
I wrote a ad hoc script as follows,
use Encode;
my $jp = "ISO-2022-JP";
Encode::encode($jp, "foo"); # should croak if you are right, NI-S
print join("\n", map{"\$INC{$_} == $INC{$_}"} grep m,^Encode/,o, keys
%INC);
printf "$jp => %s\n", find_encoding($jp)->name;
for (my $i = 0; $i < length($jp); $i++){
my $alias = $jp;
my $char = substr($alias,$i,1);
substr($alias, $i, 1) = lc($char);
printf "$alias => %s\n", find_encoding($alias)->name;
}
__END__
And here is the outcome.
% perl5.7.3 foo
$INC{Encode/Alias.pm} ==
/Users/dankogai/lib/perl5/5.7.3/darwin/Encode/Alias.pm
$INC{Encode/JP.pm} == /Users/dankogai/lib/perl5/5.7.3/darwin/Encode/JP.pm
$INC{Encode/Config.pm} ==
/Users/dankogai/lib/perl5/5.7.3/darwin/Encode/Config.pm
$INC{Encode/Encoding.pm} ==
/Users/dankogai/lib/perl5/5.7.3/darwin/Encode/Encoding.pm
$INC{Encode/JP/2022_JP.pm} ==
/Users/dankogai/lib/perl5/5.7.3/Encode/JP/2022_JP.pm
$INC{Encode/XS.pm} == /Users/dankogai/lib/perl5/5.7.3/darwin/Encode/XS.pm
$INC{Encode/JP/JIS.pm} ==
/Users/dankogai/lib/perl5/5.7.3/Encode/JP/JIS.pm
$INC{Encode/CJKConstants.pm} ==
/Users/dankogai/lib/perl5/5.7.3/darwin/Encode/CJKConstants.pm
$INC{Encode/JP/2022_JP1.pm} ==
/Users/dankogai/lib/perl5/5.7.3/Encode/JP/2022_JP1.pm
$INC{Encode/JP/H2Z.pm} ==
/Users/dankogai/lib/perl5/5.7.3/Encode/JP/H2Z.pmISO-2022-JP =>
iso-2022-jp
iSO-2022-JP => iso-2022-jp
IsO-2022-JP => iso-2022-jp
ISo-2022-JP => iso-2022-jp
ISO-2022-JP => iso-2022-jp
ISO-2022-JP => iso-2022-jp
ISO-2022-JP => iso-2022-jp
ISO-2022-JP => iso-2022-jp
ISO-2022-JP => iso-2022-jp
ISO-2022-JP => iso-2022-jp
ISO-2022-jP => iso-2022-jp
ISO-2022-Jp => iso-2022-jp
- euc-jp "\xDC" does not map to Unicode (3) at
/tools/perls/lib/5.7.3/i686-linux-multi/Encode.pm line 142.
Will try and convert latter to a test when I have figured out what
the offending source data is (and checking for bugs in my RFC2047
hack).
Well now that we have raw encodings we don't have to trepass EUC to
decode iso-2022-jp (saves tr//) but there must be a way to tell which
character set a given character belongs when you encode to iso-2022-jp.
EUC still comes in handy there.
At any rate, I wanted to clean up 7bit-jis, ISO-2022-JP and
ISO-2022-JP1 anyway. I'll make this the assignment of today.
Dan