perl-unicode

Re: chr(0xE3).chr(0x81).chr(0x82) =~ /^\x{3042}$/; # match!

2003-04-07 09:30:05
Dan Kogai wrote:
use strict;
use warnings;
$\ = "\n";

use encoding "utf8";
my $e = chr(0xE3).chr(0x81).chr(0x82);
print $e                            =~ /^\x{3042}$/ ? 'true' : 'false';
print chr(0xE3).chr(0x81).chr(0x82) =~ /^\x{3042}$/ ? 'true' : 'false';
__END__

This prints "false" for the first but "true" for the next one.  U+3042 
(HIRAGANA LETTER A) in UTF-8 is \xE3\x81\x82 so bytewise they may match 
but the UTF8 flag for chr(0xE3).chr(0x81).chr(0x82) is off so it should 
not match (regardless of use (utf8|bytes).  So the first one is okay 
but the second one is not.

Question :
I don't understand why chr(0xE3).chr(0x81).chr(0x82) should be
treated differently from "\xe3\x81\x82" (knowing that constant folding
happens at compile-time on concatenation of constant strings.)

-- 
Untried is not *NIX

<Prev in Thread] Current Thread [Next in Thread>