perl-unicode

Re: range operator vs. unicode

2006-06-08 04:35:55
On Thu, Jun 08, 2006 at 05:03:15PM +0900, Dan Kogai wrote:
I found that ('a'..'z') works only for alphanumerals.  Try the code  
below;

use strict;
use warnings;
#use utf8;
use charnames ':full';
binmode STDOUT, ':utf8';
# works
print "$_\n" for ("\N{LATIN CAPITAL LETTER A}" .. "\N{LATIN CAPITAL  
LETTER Z}");
# (0..9, 'A'..'Z', 'a'..'z'); symbols skipped
print "$_\n" for ("\N{DIGIT ZERO}" .. "\N{LATIN SMALL LETTER Z}");

Right.

# does not work
print "$_\n" for ("\N{LATIN SMALL LETTER A}" .. "\N{LEFT CURLY  
BRACKET}");

The above should print a, ..., z, and does do so.  The next in the
series after z is aa, which is longer than LEFT CURLY BRACKET, so the
range is ended with z.

Since magical string increment doesn't recognize any of the below
starting characters, the next three ranges should just return the
starting element.

print "$_\n" for ("\N{NO-BREAK SPACE}" .. "\N{LATIN SMALL LETTER Y  
WITH DIAERESIS}");
print "$_\n" for ("\N{GREEK CAPITAL LETTER ALPHA}" .. "\N{GREEK  
CAPITAL LETTER OMEGA}");
print "$_\n" for ("\N{KATAKANA LETTER SMALL A}" .. "\N{KATAKANA  
LETTER VO}")
__END__

There is an easy workaround, however.

my @katakana = map { chr } ("\N{KATAKANA LETTER SMALL A}" .. "\N 
{KATAKANA LETTER VO}");

Did you mean:
 ord("\N{KATAKANA LETTER SMALL A}") .. ord("\N{KATAKANA LETTER VO}");
?

Since we have a workaround above, I don't consider this range  
implementation is a bug -- after all we would be rather surprised if  
('\x0' .. '\x{10FFFF}') worked.  But the following should be fixed so  
greeks are not confused with the consequence of  ("\N{GREEK CAPITAL  
LETTER ALPHA}" .. "\N{GREEK CAPITAL LETTER OMEGA}"), japanese are not  
confused with ("\N{KATAKANA LETTER SMALL A}" .. "\N{KATAKANA LETTER  
VO}") and so forth.

Which part should be fixed?
 
perldoc perlop
      The range operator (in list context) makes use of the  
magical auto-
      increment algorithm if the operands are strings.  You can say

The key part is that magical auto-increment is defined earlier as
only working for strings matching "/^[a-zA-Z]*[0-9]*\z/".


          @alphabet = ('A' .. 'Z');

      to get all normal letters of the English alphabet, or

          $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15];

      to get a hexadecimal digit, or

          @z2 = ('01' .. '31');  print $z2[$mday];

      to get dates with leading zeros.  If the final value  
specified is not
      in the sequence that the magical increment would produce,  
the sequence
      goes until the next value would be longer than the final  
value speci-
      fied.

<Prev in Thread] Current Thread [Next in Thread>