perl-unicode

no. of bytes for each matching code pt in $1

2005-06-02 07:05:24
Hi,
I run this on z/OS and perl-5.8.6.

$a = 128;
$b = 256;

for ($i=$a;$i<=$b;$i++)
{
  $str = join '', $str, pack 'U*', $i;
}

if ($str =~ /(\p{inlatin1supplement}+)/)
{
    print "\$1 : $1\n";
}

I get the following values :
a) for $a = 128
 $b = 256
 $1 has 1 byte representations for each of (128-159)
and 2 byte representations for each of (160-255)
b) $a = 160
   $b = 240
 $1 : 2 bytes for each of (160-240)
c) $a = 192
   $b = 240
 $1 : 1 byte for the complete range of code pt values
(192 - 240)
d) $a = 192
   $b = 256
 $1 : 1 byte for each of (192-255)

$1 contains either 1 byte or two byte or both
representations of the matching code pt values
depending on the range that is specified to construct
$str. 

1) Is this behaviour incorrect and needs to be fixed
for $1 to always contain 1 byte representation only
?(since on ascii $1 always contains 1 byte
representations only for any matching code pt value <
256). 
2) If it is correct, then what is significant about
the code pt 192 which changes $1 (1 byte
representation (case b above) to 2 bytes (case c
above)) eventhough $b = 240 in both cases ? 

Thanks in advance,
Rajarshi.


                
__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - Find what you need with new enhanced search. 
http://info.mail.yahoo.com/mail_250

<Prev in Thread] Current Thread [Next in Thread>
  • no. of bytes for each matching code pt in $1, Rajarshi Das <=