On Wed, Apr 7, 2010 at 17:42, Aristotle Pagaltzis <pagaltzis(_at_)gmx(_dot_)de>
wrote:
* Michael Ludwig <michael(_dot_)ludwig(_at_)xing(_dot_)com> [2010-04-07
15:00]:
Having read Juerd's list of useful advice, I don't understand
the reason for its last three items:
• utf8::upgrade before doing lc/lcfirst/uc
• utf8::upgrade before doing case insensitive matching
• utf8::upgrade before matching predefined character classes
like w and s
Can anyone enlighten me on the background of using
utf8::upgrade here?
Perl versions up to the upcoming 5.12.0 (I think) are buggy in
that they apply ISO-8859-1 semantics to downgraded strings and
Unicode semantics to upgraded strings
This fix was withdrawn from 5.12.0. Currently you have to "use
feature 'unicode_strings'" to get the sane behaviour in the current
lexical scope. Current 'perldoc unicode' also says:
The "use feature 'unicode_strings'" pragma is intended to
always, regardless of platform, force Unicode semantics in
a particular lexical scope. In release 5.12, it is
partially implemented, applying only to case changes. See
"The "Unicode Bug"" below.
This means that the utf8::upgrade() advice also applies to perl-5.12.0.
Regards,
Gisle
, even when they contain the
same data. By upgrading your strings, you make sure that you get
Unicode semantics consistently.
Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>