perl-unicode

[PATCH] Re: range operator vs. unicode

2006-06-08 04:35:32
On Thu, Jun 08, 2006 at 05:56:13PM +0900, Dan Kogai wrote:
On Jun 08, 2006, at 17:34 , Yitzchak Scott-Thoennes wrote:
Which part should be fixed?

The limitation of the magic, namely....

The key part is that magical auto-increment is defined earlier as
only working for strings matching "/^[a-zA-Z]*[0-9]*\z/".

Which is described in "Auto-increment and Auto-decrement", though  
"Range Operator" does mention.

perldoc perlop
      The range operator (in list context) makes use of the  
magical auto-
      increment algorithm if the operands are strings.

This would make lawyers happy enough but not (Uni)?coders like  
myself.  With the advent of Unicode support more people would attempt  
things like ("\N{alpha}" .. "\N{omega}") and wonder why it does not  
work like ("a".."z").  So we should add something like;

=head2 CAVEAT

Note that the range operator cannot apply magic beyond C<[a-zA-Z0-9] 
.  Therefore

  use charnames 'greek';
  my @greek_small =  ("\N{alpha}" .. "\N{omega}");

Does not work.  If you want non-ascii ranges, try

  my @greek_small =  map { chr } ( ord("\N{alpha}") .. ord("\N 
{omega}") );

On the other hand, ranges in regexp and C<tr///> works.  You may  
consider this inconsistent but range operator must accept variables  
like <tt>($start .. $end)</tt> while character ranges in regexp is  
constant.

Hmm, we don't seem to document even what something like "+" .. "-"
does.  How does this look:

--- perl/pod/perlop.pod.orig    2006-05-15 09:48:33.000000000 -0700
+++ perl/pod/perlop.pod 2006-06-08 02:30:45.500000000 -0700
@@ -648,10 +648,22 @@
 
     @z2 = ('01' .. '31');  print $z2[$mday];
 
-to get dates with leading zeros.  If the final value specified is not
-in the sequence that the magical increment would produce, the sequence
-goes until the next value would be longer than the final value
-specified.
+to get dates with leading zeros.
+
+If the final value specified is not in the sequence that the magical
+increment would produce, the sequence goes until the next value would
+be longer than the final value specified.
+
+If the initial value specified isn't part of a magical increment
+sequence (that is, matching "/^[a-zA-Z]*[0-9]*\z/"), only the initial
+value will be returned.  So the following will only return an alpha:
+
+    use charnames 'greek';
+    my @greek_small =  ("\N{alpha}" .. "\N{omega}");
+
+Use this instead:
+
+    my @greek_small =  map { chr } ( ord("\N{alpha}") .. ord("\N{omega}") );
 
 Because each operand is evaluated in integer form, C<2.18 .. 3.14> will
 return two elements in list context.

<Prev in Thread] Current Thread [Next in Thread>