Re: Odd regexp behavior


Markus(_dot_)Kuhn(_at_)cl(_dot_)cam(_dot_)ac(_dot_)uk said:

$ perl -e '$x = "\x{2019}\nk"; $x =~ s/(\S)\n(\S)/$1 $2/sg; print "$x\n";'
'    <= this denotes a \x{2019} followed by \n
k $ perl -e

$ perl -e '$x = "b\nk"; $x =~ s/(\S)\n(\S)/$1 $2/sg; print "$x\n";'
b k 

[snip]

$ perl -e 'print (("\x{2019}" =~ /\S/) . "\n");'
1


This behavior certainly does seem to contradict expectations.  I even 
thought that the third test might not be exactly equivalent to the 
first, so I tried this:

$ perl -e '$x = "\x{2019}"; print "x2019 matches \\S\n" if ( $x =~ /\S/ );'
x2019 matches \S


But since perl provides many ways of doing the same thing (or at least 
trying to), there is an "idiom" that will produce the expected result:

 require 5.008;

 use Encode;

 $x = encode( "utf8", "\x{2019}\nk" );
 $x =~ s/(\S)\n(\S)/$1 $2/sg;
 print "$x\n";

 __END__

 __OUTPUT__
 ' k

Even in this case, I was puzzled as to why I got the expected behavior
by using the "encode()" method this way, but not when I used "decode()"
instead. (I should have expected it to be the other way around?)
Go figure...

        Dave Graff

<Prev in Thread]	Current Thread	[Next in Thread>
Odd regexp behavior, Markus Kuhn Re: Odd regexp behavior, Jarkko Hietaniemi Re: Odd regexp behavior, Andreas J. Koenig Re: Odd regexp behavior, David Graff <= Re: Odd regexp behavior, David Graff

Previous by Date:	Re: Odd regexp behavior, Jarkko Hietaniemi
Next by Date:	Re: Odd regexp behavior, David Graff
Previous by Thread:	Re: Odd regexp behavior, Andreas J. Koenig
Next by Thread:	Re: Odd regexp behavior, David Graff
Indexes:	[Date] [Thread] [Top] [All Lists]