perl-unicode

Perl 5.6.1 and regex captures

2002-02-26 04:43:52
Hi list,

  I am running perl 5.6.1 on a redhat box, and I have come across this
  wierd (bug|feature|annoying thing). If this problem has been raised
  before please give me a reference to the "F" manual :-) 

TEST SCRIPT:
============

use strict;
use utf8;

main();
sub main
{
    # \x{A9} is the copyright string
    #
    my $data = "Copyright \x{A9} 2001-2002 MKDoc Ltd";
    my $dlm = '(?:\p{IsSpace}|\p{IsPunct})';
    my $re = 'MKDoc';

    print "BEFORE: $data\n";
    my @split = $data =~ /^(.*?$dlm)($re)($dlm.*?)$/ism;
    $data = join '', @split;
    print "AFTER : $data\n";
}

1;


And here is what I get

[jhiver(_at_)frogette mkdoc]$ perl -w test2.pl
BEFORE: Copyright © 2001-2002 MKDoc Ltd
AFTER : Copyright © 2001-2002 MKDoc Ltd
[jhiver(_at_)frogette mkdoc]$ 


My terminal doesn't support UTF-8, which in this case is good because I
an see all the caracters... surprise, using regexes capture seems to
remove string utf8ness although the string IS utf8 and 'use utf8' is
there...

Any ideas?
Cheers,

-- 
IT'S TIME FOR A DIFFERENT KIND OF WEB
================================================================
  Jean-Michel Hiver - Software Director
  jhiver(_at_)mkdoc(_dot_)com
  +44 (0)114 221 4968
================================================================
                                      VISIT HTTP://WWW.MKDOC.COM

<Prev in Thread] Current Thread [Next in Thread>