On Thu, 28 Apr 2011 10:06:58 -0700 (PDT)
Frank Müller <pottwal1(_at_)freenet(_dot_)de> wrote:
dear all,
I'm trying to do some string replacements with Unicode::Collate which
usually work very well, but these replacements seem to be case
insensitive by default - how can I change this? look at this simple
example:
my $myCollator = Unicode::Collate->new( normalization => undef, level
=> 1 );
my $str = "Camel camel donkey zebra came\x{301}l CAMEL horse
cAmEL...";
$myCollator->gsubst($str, "camel", sub { "#$_[0]#" });
which makes the following replacements:
#Camel# #camel# donkey zebra #camél# #CAMEL# horse #cAmEL#...
what I would love to see is the following result:
Camel #camel# donkey zebra #camél# CAMEL horse cAmEL...
As there doesn't seem to be gsubst for case sensitive and gisubst for
case insensitive string replacements, what would a solution look like?
Thanks a lot for any suggestions,
Frank
As (level => 1) is not default, (level => 3) is also allowed for case
sensitive matching. But UCA thinks accent difference (level 2) is
more important than case difference (level 3), then camél won't
match camel when (level => 3).
level 1: camel matches camél and Camel.
level 2: camel matches Camel but not camél.
level 3: camel matches neither Camel nor camél.
--Even at level 3, it isn't so strict:
camel matches "c-a-m-e-l", "ca mel", etc.
since punctuation difference is level 4.
To make camel match camél but not Camel, other workwround is
need. In next release, a new parameter (ignore_level2) will allow it.
(However the behavior of ignore_level2 is quite different from
so-called caseLevel in UCA etc.)
Regards,
SADAHIRO Tomoyuki