perl-unicode

[Encode] "\x{df}" is NOT UTF-8 (Was: [Re: Encode anomalies ...)

2002-03-30 02:39:52
On Saturday, March 30, 2002, at 04:57 , Andreas J. Koenig wrote:
All the warnings below seem bogus to me.

    % /usr/local/perl-5(_dot_)7(_dot_)3(_at_)15620/bin/perl -wle '
    use Encode qw(from_to);
    $x = "\x{df}";
    from_to($x,"utf-8","iso8859-1");
    '
Use of uninitialized value in subroutine entry at /usr/local/perl-5(_dot_)7(_dot_)3(_at_)15620/lib/5.7.3/i686-linux-thread-multi/Encode.pm line 200. Use of uninitialized value in subroutine entry at /usr/local/perl-5(_dot_)7(_dot_)3(_at_)15620/lib/5.7.3/i686-linux-thread-multi/Encode.pm line 200.
[snip]

I was confused with this one until I have read it thought till the end of your report. "\x{df}" is NOT UTF-8! Let's see this one one more time.

use Encode qw(encode_utf8 from_to);
$x = encode_utf8("\x{df}");
from_to($x,"utf-8","iso8859-1");

  And the following will warn like this;

"\N{U+100}" does not map to iso-8859-1 at /usr/home/dankogai/work/Encode/blib/lib/Encode.pm line 200. Use of uninitialized value in length at /usr/home/dankogai/work/Encode/blib/lib/Encode.pm line 202.

  The first one is a good, informative warning but the second one is not.
It is encode() that is warning. The following pseudo-diff-u will fix it.

 sub from_to
 {
     my ($string,$from,$to,$check) = @_;
     my $f = find_encoding($from);
     croak("Unknown encoding '$from'") unless defined $f;
     my $t = find_encoding($to);
     croak("Unknown encoding '$to'") unless defined $t;
     my $uni = $f->decode($string,$check);
     return undef if ($check && length($string));
     $string = $t->encode($uni,$check);
     return undef if ($check && length($uni));
-    return length($_[0] = $string);
+    return defined($_[0] = $string) ? length($string) : undef ;
 }

I have also added documents on the return value of from_to()
But "raw" "\x{}" is beyond my cap.  Any suggestions?  NI-S? jhi?

Dan the Encode Maintainer

<Prev in Thread] Current Thread [Next in Thread>