perl-unicode

Re: iso-2022-jp snags.

2002-04-11 12:24:19
On Friday, April 12, 2002, at 02:30 , Nick Ing-Simmons wrote:
Having hacked RFC2047 support into tkmail I have now seen some
non-latin1 characters in a "real" perl/Tk app.

There seem to be a few snags with mime's iso-2022-jp:

- It failed to demand load given upper-case form ISO-2022-JP

What's Encode->VERSION say?  Here is the current status on this one.

I wrote a ad hoc script as follows,

use Encode;
my $jp = "ISO-2022-JP";
Encode::encode($jp, "foo"); # should croak if you are right, NI-S
print join("\n", map{"\$INC{$_} == $INC{$_}"} grep m,^Encode/,o, keys %INC);
printf "$jp => %s\n", find_encoding($jp)->name;
for (my $i = 0; $i < length($jp); $i++){
    my $alias = $jp;
    my $char = substr($alias,$i,1);
    substr($alias, $i, 1) = lc($char);
    printf "$alias => %s\n", find_encoding($alias)->name;
}
__END__

And here is the outcome.

% perl5.7.3 foo
$INC{Encode/Alias.pm} == /Users/dankogai/lib/perl5/5.7.3/darwin/Encode/Alias.pm
$INC{Encode/JP.pm} == /Users/dankogai/lib/perl5/5.7.3/darwin/Encode/JP.pm
$INC{Encode/Config.pm} == /Users/dankogai/lib/perl5/5.7.3/darwin/Encode/Config.pm $INC{Encode/Encoding.pm} == /Users/dankogai/lib/perl5/5.7.3/darwin/Encode/Encoding.pm $INC{Encode/JP/2022_JP.pm} == /Users/dankogai/lib/perl5/5.7.3/Encode/JP/2022_JP.pm
$INC{Encode/XS.pm} == /Users/dankogai/lib/perl5/5.7.3/darwin/Encode/XS.pm
$INC{Encode/JP/JIS.pm} == /Users/dankogai/lib/perl5/5.7.3/Encode/JP/JIS.pm $INC{Encode/CJKConstants.pm} == /Users/dankogai/lib/perl5/5.7.3/darwin/Encode/CJKConstants.pm $INC{Encode/JP/2022_JP1.pm} == /Users/dankogai/lib/perl5/5.7.3/Encode/JP/2022_JP1.pm $INC{Encode/JP/H2Z.pm} == /Users/dankogai/lib/perl5/5.7.3/Encode/JP/H2Z.pmISO-2022-JP => iso-2022-jp
iSO-2022-JP => iso-2022-jp
IsO-2022-JP => iso-2022-jp
ISo-2022-JP => iso-2022-jp
ISO-2022-JP => iso-2022-jp
ISO-2022-JP => iso-2022-jp
ISO-2022-JP => iso-2022-jp
ISO-2022-JP => iso-2022-jp
ISO-2022-JP => iso-2022-jp
ISO-2022-JP => iso-2022-jp
ISO-2022-jP => iso-2022-jp
ISO-2022-Jp => iso-2022-jp

- euc-jp "\xDC" does not map to Unicode (3) at
  /tools/perls/lib/5.7.3/i686-linux-multi/Encode.pm line 142.

Will try and convert latter to a test when I have figured out what
the offending source data is (and checking for bugs in my RFC2047
hack).

Well now that we have raw encodings we don't have to trepass EUC to decode iso-2022-jp (saves tr//) but there must be a way to tell which character set a given character belongs when you encode to iso-2022-jp. EUC still comes in handy there.

At any rate, I wanted to clean up 7bit-jis, ISO-2022-JP and ISO-2022-JP1 anyway. I'll make this the assignment of today.

Dan

<Prev in Thread] Current Thread [Next in Thread>