perl-unicode

chr(0xE3).chr(0x81).chr(0x82) =~ /^\x{3042}$/; # match!

2003-04-07 02:30:05
Porters,

  One of the perl 5.8.0 users accidentally found this.

#
use strict;
use warnings;
$\ = "\n";

use encoding "utf8";
my $e = chr(0xE3).chr(0x81).chr(0x82);
print $e                            =~ /^\x{3042}$/ ? 'true' : 'false';
print chr(0xE3).chr(0x81).chr(0x82) =~ /^\x{3042}$/ ? 'true' : 'false';
__END__

This prints "false" for the first but "true" for the next one. U+3042 (HIRAGANA LETTER A) in UTF-8 is \xE3\x81\x82 so bytewise they may match but the UTF8 flag for chr(0xE3).chr(0x81).chr(0x82) is off so it should not match (regardless of use (utf8|bytes). So the first one is okay but the second one is not.

my $name = "\x{5c0f}\x{98fc} \x{5f3e}"; # KOGAI, Dan

<Prev in Thread] Current Thread [Next in Thread>