Markus(_dot_)Kuhn(_at_)cl(_dot_)cam(_dot_)ac(_dot_)uk said:
$ perl -e '$x = "\x{2019}\nk"; $x =~ s/(\S)\n(\S)/$1 $2/sg; print "$x\n";'
' <= this denotes a \x{2019} followed by \n
k $ perl -e
$ perl -e '$x = "b\nk"; $x =~ s/(\S)\n(\S)/$1 $2/sg; print "$x\n";'
b k
[snip]
$ perl -e 'print (("\x{2019}" =~ /\S/) . "\n");'
1
This behavior certainly does seem to contradict expectations. I even
thought that the third test might not be exactly equivalent to the
first, so I tried this:
$ perl -e '$x = "\x{2019}"; print "x2019 matches \\S\n" if ( $x =~ /\S/ );'
x2019 matches \S
But since perl provides many ways of doing the same thing (or at least
trying to), there is an "idiom" that will produce the expected result:
require 5.008;
use Encode;
$x = encode( "utf8", "\x{2019}\nk" );
$x =~ s/(\S)\n(\S)/$1 $2/sg;
print "$x\n";
__END__
__OUTPUT__
' k
Even in this case, I was puzzled as to why I got the expected behavior
by using the "encode()" method this way, but not when I used "decode()"
instead. (I should have expected it to be the other way around?)
Go figure...
Dave Graff