Greetings. My apologies if this has been brought up on the list
before; I couldn't find a pointer to the archives (if they exist).
ord, split, and substr appear to mess up with UTF-8 when returning
single characters:
=====
use utf8;
$a = "\x{0061}\x{0222}\x{0061}";
print "The whole length is ", length($a), "\n";
@b = split(//, $a);
foreach $c (@b) {
print "The ord in the split is ", ord($c);
if($c eq "\x{0222}") { print " and is equal to U+0222"}
print "\n";
}
for($i=0; $i<length($a); $i++) {
$c = substr($a, $i, 1);
print "The ord in the index is ", ord($c);
if($c eq "\x{0222}") { print " and is equal to U+0222"}
print "\n";
}
$d = "\x{0222}";
print "The ord outside the split is ", ord($d), "\n";
=====
In 5.6.0, this produces:
The whole length is 3
The ord in the split is 97
The ord in the split is 200 and is equal to U+0222
The ord in the split is 97
The ord in the index is 97
The ord in the index is 200 and is equal to U+0222
The ord in the index is 97
The ord outside the split is 546
Has anyone else come across this? Is there a way to use ord in a loop
after a split that works?