Re: Prototype for decode

Autrijus,

Thanks for the report :) -- murphy's law strikes :(

On Friday, Sep 26, 2003, at 17:23 Asia/Tokyo, Autrijus Tang wrote:

$ perl -MEncode -e'print Encode::decode_utf8(1, 1)'
Too many arguments for Encode::decode_utf8 at -e line 1, at end of line

$ perldoc Encode |grep decode_utf8
       $string = decode_utf8($octets [, CHECK]);


A tricky bug you have found.  Here is what the document says.

       $string = decode_utf8($octets [, CHECK]);
equivalent to "$string = decode("utf8", $octets [, CHECK])".Thesequence of octets represented by $octets is decoded fromUTF-8 intoa sequence of logical characters. Not all sequences of octetsformvalid UTF-8 encodings, so it is possible for this call tofail. For
         CHECK, see "Handling Malformed Data".


and here is how it is really implemented:

sub decode_utf8($)
{
    my ($str) = @_;
    return undef unless utf8::decode($str);
    return $str;
}

which is RIGHT so long as the prototype of utf8::decode() is '$'

% perl -e 'print utf8::decode()'
Usage: utf8::decode(sv) at -e line 1.
% perl -e 'print utf8::decode(1)'
1
% perl -le 'print utf8::decode(1,1)'
Usage: utf8::decode(sv) at -e line 1.


and utf8::decode is not designed to return status.

% perl -MEncode -e 'print decode_utf8("\xC2\x80")' | hexdump -C
00000000  80                                                |.|
00000001
% perl -MEncode -e 'print decode_utf8("\x80")' | hexdump -C
% perl -MEncode -e 'print decode_utf8("\x7f")' | hexdump -C
00000000  7f                                                |.|
00000001

I consider this a feature bug than a documentation bug. But I wonderhow I should fix it. fixing utf8::decode() involves tweaking core soit would be nice if it can be fixed on Encode side. FortunatelyEncode::decode("utf8" => $str) works.

% perl -MEncode -e '$a="\xC2\x80"; print decode("utf8"=>$a, 1)' |hexdump -C
00000000  80                                                |.|
00000001
% perl -MEncode -e '$a="\x80"; print decode("utf8"=>$a, 1)' | hexdump-Cutf8 "\x80" does not map to Unicode at/usr/local/lib/perl5/5.8.0/i386-freebsd/Encode.pm line 164.% perl -MEncode -e '$a="\x7f"; print decode("utf8"=>$a, 1)' | hexdump-C
00000000  7f                                                |.|
00000001


so we can make decode_utf8() as follows;

sub decode_utf8($;$)
{
    my ($str, $check) = @_;
    if ($check){
                return decode("utf8", @_);
        }else{
                return undef unless utf8::decode($str);
                return $str;
        }
}

Dan the Encode Maintainer

Re: Prototype for decode_utf8 incorrect?