Odd regexp behavior

Dear UTF-8 regular expression gurus:

$ perl -e '$x = "\x{2019}\nk"; $x =~ s/(\S)\n(\S)/$1 $2/sg; print "$x\n";'
'    <= this denotes a \x{2019} followed by \n
k
$ perl -e '$x = "b\nk"; $x =~ s/(\S)\n(\S)/$1 $2/sg; print "$x\n";'
b k

Any idea, why the Unicode apostrophe is not matched by a \S in the first
case, whereas the 'b' is?

Also note that

$ perl -e 'print (("\x{2019}" =~ /\S/) . "\n");'
1

so \x{2019} *does* match \S in principle ... odd.

(Perl v5.6.0 built for i386-linux)

Markus

-- 
Markus Kuhn, Computer Lab, Univ of Cambridge, GB
http://www.cl.cam.ac.uk/~mgk25/ | __oo_O..O_oo__

<Prev in Thread]	Current Thread	[Next in Thread>
Odd regexp behavior, Markus Kuhn <= Re: Odd regexp behavior, Jarkko Hietaniemi Re: Odd regexp behavior, Andreas J. Koenig Re: Odd regexp behavior, David Graff Re: Odd regexp behavior, David Graff

Previous by Date:	Re: Sending a Unicode character in an e-mail subject line, Andreas J. Koenig
Next by Date:	Re: Odd regexp behavior, Jarkko Hietaniemi
Previous by Thread:	Sending a Unicode character in an e-mail subject line, Henning Møller-Nielsen
Next by Thread:	Re: Odd regexp behavior, Jarkko Hietaniemi
Indexes:	[Date] [Thread] [Top] [All Lists]