perl-unicode

Character classes with Unicode

2002-02-15 11:27:44
Hello,

I can't get character classes in regular experession to work with
Unicode characters.  I've tried both putting both the literal Unicode
characters and the \x{XX} notation within square brackets [] to create
a character class, but it's not working.  I've tried with both the
developer release of Perl 5.7.2 and the daily build from 2002/02/13.

Here's an example of some code that isn't working for me:
---
#!/usr/local/bin/perl5.7.2
use Encode;
use utf8;

$string = encode_utf8("f\x{e9}lise");
$string =~ s/f[e\x{e8}\x{e9}\x{ea}\x{eb}]lise/SUCCESS/; #does not match
print "new string: $string\n";
---

With another approach, this works:

#!/usr/local/bin/perl5.7.2
use Encode;
use utf8;

$string = encode_utf8("f\x{e9}lise");
$regex = encode_utf8("f\x{e9}lise");
$string =~ s/$regex/SUCCESS/; #matches
print "new string: $string\n";

While this does not:

#!/usr/local/bin/perl5.7.2
use Encode;
use utf8;

$string = encode_utf8("f\x{e9}lise");
$regex = encode_utf8("f[\x{e9}\x{e8}]lise");
$string =~ s/$regex/SUCCESS/; #does not match
print "new string: $string\n";

Should examples 1 and 3 be working?  Thanks for listening.

John
| John A. Walsh, Manager, Electronic Text Technologies
| Digital Library Program / University Information Technology Services (UITS)
| Indiana University, 1320 East Tenth Street, Bloomington, IN 47405
| Voice:812-855-8758 Fax:812-856-2062 <mailto:jawalsh(_at_)indiana(_dot_)edu>

<Prev in Thread] Current Thread [Next in Thread>