perl-unicode

range operator vs. unicode

2006-06-08 01:04:11
Porters,

I found that ('a'..'z') works only for alphanumerals. Try the code below;

use strict;
use warnings;
#use utf8;
use charnames ':full';
binmode STDOUT, ':utf8';
# works
print "$_\n" for ("\N{LATIN CAPITAL LETTER A}" .. "\N{LATIN CAPITAL LETTER Z}");
# (0..9, 'A'..'Z', 'a'..'z'); symbols skipped
print "$_\n" for ("\N{DIGIT ZERO}" .. "\N{LATIN SMALL LETTER Z}");
# does not work
print "$_\n" for ("\N{LATIN SMALL LETTER A}" .. "\N{LEFT CURLY BRACKET}"); print "$_\n" for ("\N{NO-BREAK SPACE}" .. "\N{LATIN SMALL LETTER Y WITH DIAERESIS}"); print "$_\n" for ("\N{GREEK CAPITAL LETTER ALPHA}" .. "\N{GREEK CAPITAL LETTER OMEGA}"); print "$_\n" for ("\N{KATAKANA LETTER SMALL A}" .. "\N{KATAKANA LETTER VO}")
__END__

There is an easy workaround, however.

my @katakana = map { chr } ("\N{KATAKANA LETTER SMALL A}" .. "\N {KATAKANA LETTER VO}");


Since we have a workaround above, I don't consider this range implementation is a bug -- after all we would be rather surprised if ('\x0' .. '\x{10FFFF}') worked. But the following should be fixed so greeks are not confused with the consequence of ("\N{GREEK CAPITAL LETTER ALPHA}" .. "\N{GREEK CAPITAL LETTER OMEGA}"), japanese are not confused with ("\N{KATAKANA LETTER SMALL A}" .. "\N{KATAKANA LETTER VO}") and so forth.

perldoc perlop
The range operator (in list context) makes use of the magical auto-
       increment algorithm if the operands are strings.  You can say

           @alphabet = ('A' .. 'Z');

       to get all normal letters of the English alphabet, or

           $hexdigit = (0 .. 9, 'a' .. 'f')[$num & 15];

       to get a hexadecimal digit, or

           @z2 = ('01' .. '31');  print $z2[$mday];

to get dates with leading zeros. If the final value specified is not in the sequence that the magical increment would produce, the sequence goes until the next value would be longer than the final value speci-
       fied.

Dan the Man with Too Many Characters to Squeeze in the Range

<Prev in Thread] Current Thread [Next in Thread>