Weird interaction of ord, split, and substr with UTF-8?

Greetings. My apologies if this has been brought up on the listbefore; I couldn't find a pointer to the archives (if they exist).

ord, split, and substr appear to mess up with UTF-8 when returningsingle characters:


=====
use utf8;

$a = "\x{0061}\x{0222}\x{0061}";
print "The whole length is ", length($a), "\n";

@b = split(//, $a);
foreach $c (@b) {
    print "The ord in the split is ", ord($c);
    if($c eq "\x{0222}") { print " and is equal to U+0222"}
    print "\n";
}

for($i=0; $i<length($a); $i++) {
    $c = substr($a, $i, 1);
    print "The ord in the index is ", ord($c);
    if($c eq "\x{0222}") { print " and is equal to U+0222"}
    print "\n";
}

$d = "\x{0222}";
print "The ord outside the split is ", ord($d), "\n";
=====
In 5.6.0, this produces:

The whole length is 3
The ord in the split is 97
The ord in the split is 200 and is equal to U+0222
The ord in the split is 97
The ord in the index is 97
The ord in the index is 200 and is equal to U+0222
The ord in the index is 97
The ord outside the split is 546

Has anyone else come across this? Is there a way to use ord in a loopafter a split that works?

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re: perl5.6+ native UTF8 to/from external encodings, Peter Prymmer

Next by Date:

Re: Weird interaction of ord, split, and substr with UTF-8?, Andreas J. Koenig

Previous by Thread:

Re: perl5.6+ native UTF8 to/from external encodings, Nick Ing-Simmons

Next by Thread:

Re: Weird interaction of ord, split, and substr with UTF-8?, Andreas J. Koenig

Indexes:

[Date] [Thread] [Top] [All Lists]