On Wed, 16 Apr 2003 16:40:52 +0100
Jean-Michel Hiver <jhiver(_at_)mkdoc(_dot_)com> wrote:
Hi List,
I am trying to use regular expressions on unicode strings, trying to
match for right-to-left characters using \p{BidiR} but I can't seem to
get it to work... Here's my test script:
use Encode;
use Test::More 'no_plan';
use strict;
use warnings;
use utf8;
# This string is "What is unicode?" in arabic.
# Hence it's got plenty of right to left characters.
my $text = "ما هي الشفرة الموحدة يونِكود ؟";
# make sure that the utfness of the string is known by perl
ok (Encode::is_utf8 ($text), 'utf8 flag is on');
# perldoc unicode says:
# For example, "\p{BidiR}" matches characters that are normally written
right to left.
like ($text, qr/\p{BidiR}/, 'text has some right to left characters');
\p{BidiAL} should be used for Arabic Letters, instead of \p{BidiR}.
http://www.unicode.org/Public/UNIDATA/extracted/DerivedBidiClass.txt
SADAHIRO Tomoyuki