On Jan 01, 2004, at 21:49, Masanori HATA wrote:
Sorry, no. Since the case which I would like to suggest
seems not to be fatal. Perl would not die, but it would
take the tainted value as a Non-UTF8 string.
My sample code is like below (test.pl):
-------------------------------------------------
utf8::decode(my $text0 = "\x{3042}" ); # clean
utf8::decode(my $arg = $ARGV[0] ); # tainted
utf8::decode(my $text1 = "$arg$text0"); # tainted
utf8::decode(my $text2 = "$text0$arg"); # tainted
print length($text1), "\n";
print length($text2), "\n";
-------------------------------------------------
Aha! I see your point at last. And I found your argument was correct.
When I run this code with 'perl -T test.pl a', the result is:
To clear your point, I have modified your script with Devel::Peek. Pay
attention to the $text1 result.
without -T
% perl test.pl a
SV = PV(0x812354) at 0x80a960
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK,UTF8)
PV = 0x428090 "a\343\201\202"\0 [UTF8 "a\x{3042}"]
CUR = 4
LEN = 5
2
SV = PV(0x812e10) at 0x80f2a8
REFCNT = 1
FLAGS = (PADBUSY,PADMY,POK,pPOK,UTF8)
PV = 0x405150 "\343\201\202a"\0 [UTF8 "\x{3042}a"]
CUR = 4
LEN = 5
2
with -T
% perl -T test.pl a
SV = PVMG(0x819a88) at 0x80a954
REFCNT = 1
FLAGS = (PADBUSY,PADMY,GMG,SMG,pPOK)
IV = 0
NV = 0
PV = 0x428540 "a\343\201\202"\0
CUR = 4
LEN = 5
MAGIC = 0x405480
MG_VIRTUAL = &PL_vtbl_taint
MG_TYPE = PERL_MAGIC_taint(t)
MG_LEN = 1
4
SV = PVMG(0x819af4) at 0x80f69c
REFCNT = 1
FLAGS = (PADBUSY,PADMY,GMG,SMG,pPOK,UTF8)
IV = 0
NV = 0
PV = 0x4054e0 "\343\201\202a"\0 [UTF8 "\x{3042}a"]
CUR = 4
LEN = 5
MAGIC = 0x4010d0
MG_VIRTUAL = &PL_vtbl_taint
MG_TYPE = PERL_MAGIC_taint(t)
MG_LEN = 1
2
I am not sure how severe it is but this is a bug indeed.
(My system is perl5.8.1 MSWin32-X86-multi-thread)
I have duplicated the result with Perl 5.8.2 on Mac OS X as well as
maintperl(_at_)21987 on FreeBSD. And using Encode::decode_utf8 does not
help either because it simply calls utf8::decode. And you can't use
Encode::decode("utf8", ...) in this particular case because
Encode::decode() checks and clobbers at "Cannot decode string with wide
characters". Hmm....
Dan the Perl5 Porter