perl-unicode

Re: Use case for utf8::upgrade?

2010-04-08 10:28:20
On Wed, Apr 7, 2010 at 17:42, Aristotle Pagaltzis <pagaltzis(_at_)gmx(_dot_)de> 
wrote:
* Michael Ludwig <michael(_dot_)ludwig(_at_)xing(_dot_)com> [2010-04-07 
15:00]:
Having read Juerd's list of useful advice, I don't understand
the reason for its last three items:

• utf8::upgrade before doing lc/lcfirst/uc
• utf8::upgrade before doing case insensitive matching
• utf8::upgrade before matching predefined character classes
  like w and s

Can anyone enlighten me on the background of using
utf8::upgrade here?

Perl versions up to the upcoming 5.12.0 (I think) are buggy in
that they apply ISO-8859-1 semantics to downgraded strings and
Unicode semantics to upgraded strings

This fix was withdrawn from 5.12.0.  Currently you have to "use
feature 'unicode_strings'" to get the sane behaviour in the current
lexical scope.  Current 'perldoc unicode' also says:

       The "use feature 'unicode_strings'" pragma is intended to
       always, regardless of platform, force Unicode semantics in
       a particular lexical scope.  In release 5.12, it is
       partially implemented, applying only to case changes.  See
       "The "Unicode Bug"" below.

This means that the utf8::upgrade() advice also applies to perl-5.12.0.

Regards,
Gisle


                                        , even when they contain the
same data. By upgrading your strings, you make sure that you get
Unicode semantics consistently.

Regards,
--
Aristotle Pagaltzis // <http://plasmasturm.org/>


<Prev in Thread] Current Thread [Next in Thread>