Re: Perl 5.6.1 and regex captures

On Thu, Feb 28, 2002 at 03:58:33PM +0000, Jean-Michel Hiver wrote:

Hi again,

Sorry to send so many messages, but one of my colleagues told me that
the sample script I've sent wasn't clear enough. So here is my problem
stripped down as much as I can:

[jhiver(_at_)frogette mkdoc]$ cat test2.pl 
use strict;
use utf8;

my $data = "Copyright \x{A9} 2001-2002 MKDoc Ltd";
print $data, "\n";
print $data =~ /(.*)/, "\n";


[jhiver(_at_)frogette mkdoc]$ perl test2.pl 
Copyright Â© 2001-2002 MKDoc Ltd
Copyright © 2001-2002 MKDoc Ltd


As you can see, the string has been converted from utf-8 to latin1 just
by capturing the string... How come? How to avoid it? I've performed
several 'perl unicode regex capture' like searches on google but came
with no relevant hits :-(


What you are seeing is a bug in Perl 5.6.1.  The upcoming 5.8.0
has this fixed.

Cheers,
-- 
IT'S TIME FOR A DIFFERENT KIND OF WEB
================================================================
  Jean-Michel Hiver - Software Director
  jhiver(_at_)mkdoc(_dot_)com
  +44 (0)114 221 4968
================================================================
                                      VISIT HTTP://WWW.MKDOC.COM


-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen