ed-perluni(_at_)inkdroid(_dot_)org said:
I'm confused, can someone tell me why:
#!/usr/bin/perl
use bytes;
$x = chr( 400 );
print "Length is ", length( $x ), "\n";
prints 1, while
#!/usr/bin/perl
$x = chr( 400 );
use bytes;
print "Length is ", length( $x ), "\n";
prints 2?
The positioning of the "use bytes" pragma is important -- in the code that
follows "use bytes", the handling of values that could be wide characters
is altered to defeat interpreting them as unicode.
There is a third case, without "use bytes" in there at all, which would
also print 1. But here is a version that might be more enlightening:
#!/usr/bin/perl
$x = chr(400);
printf( "set x = %x; length of %x is %d\n", 400, ord($x), length($x);
# prints "set x = 190; length of 190 is 1
# note that "190" here means Unicode point U0190 (Latin capital letter epsilon)
use bytes;
printf( "byte length of x is %d : %x %x\n", length($x), map{ord()} split( //,
$x ));
# prints "byte length of x is 2 : c6 90
# where "c6 90" is the two-bye UTF-8 representation of U0190
# still using bytes at this point...
$x = chr(400); # doesn't do what you want: can't have byte characters > 255
printf( "set x = %x; x is really %x with length %d\n", 400, ord($x),
length($x));
# prints "set x = 190; x is really 90 with length 1"
# note that the bits above 0xFF have been ignored.
Hope that clears things up.
Dave Graff