perl-unicode

Re: use bytes

2004-05-19 02:30:07


ed-perluni(_at_)inkdroid(_dot_)org said:
I'm confused, can someone tell me why:

    #!/usr/bin/perl
    use bytes;
    $x = chr( 400 );
    print "Length is ", length( $x ), "\n";

prints 1, while

    #!/usr/bin/perl
    $x = chr( 400 );
    use bytes;
    print "Length is ", length( $x ), "\n";

prints 2? 

The positioning of the "use bytes" pragma is important -- in the code that
follows "use bytes", the handling of values that could be wide characters 
is altered to defeat interpreting them as unicode.

There is a third case, without "use bytes" in there at all, which would 
also print 1.  But here is a version that might be more enlightening:

#!/usr/bin/perl

$x = chr(400);
printf( "set x = %x; length of %x is %d\n", 400, ord($x), length($x);

# prints "set x = 190; length of 190 is 1
# note that "190" here means Unicode point U0190 (Latin capital letter epsilon)

use bytes;
printf( "byte length of x is %d : %x %x\n", length($x), map{ord()} split( //, 
$x ));

# prints "byte length of x is 2 : c6 90
# where "c6 90" is the two-bye UTF-8 representation of U0190

# still using bytes at this point...

$x = chr(400);  # doesn't do what you want: can't have byte characters > 255

printf( "set x = %x; x is really %x with length %d\n", 400, ord($x), 
length($x));

# prints "set x = 190; x is really 90 with length 1"
# note that the bits above 0xFF have been ignored.


Hope that clears things up.

        Dave Graff




<Prev in Thread] Current Thread [Next in Thread>