perl-unicode

Re: [Encode] "\x{df}" is NOT UTF-8 (Was: [Re: Encode anomalies ...)

2002-03-30 03:42:41
On Sat, 30 Mar 2002 18:39:37 +0900, Dan Kogai 
<dankogai(_at_)dan(_dot_)co(_dot_)jp> said:

  > On Saturday, March 30, 2002, at 04:57 , Andreas J. Koenig wrote:
All the warnings below seem bogus to me.

% /usr/local/perl-5(_dot_)7(_dot_)3(_at_)15620/bin/perl -wle '
use Encode qw(from_to);
$x = "\x{df}";
from_to($x,"utf-8","iso8859-1");
'
Use of uninitialized value in subroutine entry at
/usr/local/perl-5(_dot_)7(_dot_)3(_at_)15620/lib/5.7.3/i686-linux-thread-multi/Encode.pm
line 200.
Use of uninitialized value in subroutine entry at
/usr/local/perl-5(_dot_)7(_dot_)3(_at_)15620/lib/5.7.3/i686-linux-thread-multi/Encode.pm
line 200.
[snip]

  >    I was confused with this one until I have read it thought till the
  > end of your report.  "\x{df}" is NOT UTF-8!  Let's see this one one
  > more time.

Sorry for the confusion: I DID KNOW. I was complaining about the
inappropriate error message when I said:

    For the above I would expect something like "illegal character in
    string".


  > use Encode qw(encode_utf8 from_to);
  > $x = encode_utf8("\x{df}");
  > from_to($x,"utf-8","iso8859-1");

  >    And the following will warn like this;

  >   "\N{U+100}" does not map to iso-8859-1 at
  > /usr/home/dankogai/work/Encode/blib/lib/Encode.pm line 200.
  > Use of uninitialized value in length at
  > /usr/home/dankogai/work/Encode/blib/lib/Encode.pm line 202.

  >    The first one is a good, informative warning

No it isn't. \N{U+100} is not valid perl. I believe this pseudo-patch
to Encode.xs is needed:

                        Perl_warner(aTHX_ packWARN(WARN_UTF8),
-                                   "\"\\N{U+%" UVxf
+                                   "\"\\x{%" UVxf
                                    "}\" does not map to %s", ch,



  >    but the second one is not.
  >    It is encode() that is warning.  The following pseudo-diff-u will
  > fix it.

  >   sub from_to
  >   {
  >       my ($string,$from,$to,$check) = @_;
  >       my $f = find_encoding($from);
  >       croak("Unknown encoding '$from'") unless defined $f;
  >       my $t = find_encoding($to);
  >       croak("Unknown encoding '$to'") unless defined $t;
  >       my $uni = $f->decode($string,$check);
  >       return undef if ($check && length($string));
  >       $string = $t->encode($uni,$check);
  >       return undef if ($check && length($uni));
  > -    return length($_[0] = $string);
  > +    return defined($_[0] = $string) ? length($string) : undef ;
  >   }

Thanks!

-- 
andreas

<Prev in Thread] Current Thread [Next in Thread>