perl-unicode

Re: Matching encoded strings and file names

2005-12-20 17:04:25
At 10:46 am +0100 20/12/05, szpara_ga(_at_)tlen(_dot_)pl wrote:

...Let's say I have a txt file which contains a list of strings. Some of these strings contain characters encoded in this fashion:


R\xC3\xA9union (\xC3\xA9 is one character - e with an accent).

...Now, this fails, even though when I look at the file name it is Reunion (with accented e). This fails because my $in =~ s/// didn't produce an accented e, although I've checked that \xC3\xA9 is the correct encoding for that character. Can you please tell me what I am doing wrong and, more generally, how to correctly make these kinds of string comparisons with strange characters?

If I run this, which I think is reproducing your situation, first with a string in the script and then with text read from a file:

        #!/usr/bin/perl
        $in  = 'R\xC3\xA9union' . $/;
        $in =~ s~\\x(..)~chr(hex($1))~eg;
        print $in;#####
        $testtext = 'R\xC3\xA9union' . $/;
        $testfile = "$ENV{HOME}/test.txt";
        open TEST, $testfile;
        print TEST $testtext;
        close TEST;
        open TEST, "<encoding(us-ascii)", $testfile;
        while (<TEST>) {
          s~\\x(..)~chr(hex($1))~eg;
          print #####
        }

I get

        Réunion
        Réunion

Do you get a different result?

JD