perl-unicode

RE: utf8 pragma, lexical scope

2010-09-09 15:13:32
On Thu, 09 Sep 2010, Michael Ludwig wrote:

What does not work, however, is to have a variable $käse under utf8
and then try to refer to it from inside a "no utf8" block, using either
encoding. Without the utf8 pragma, identifiers are not allowed to have
funny characters. (Yes, it was a stupid exercise.)

The Perl parser is internally not UTF8-clean, so I would recommend not
to use non-ASCII characters in variable names for now, even if it looks
like it mostly works under "utf8".

From perltodo.pod:

| =head2 Properly Unicode safe tokeniser and pads.
|
| The tokeniser isn't actually very UTF-8 clean. C<use utf8;> is a hack -
| variable names are stored in stashes as raw bytes, without the utf-8 flag
| set. The pad API only takes a C<char *> pointer, so that's all bytes too. The
| tokeniser ignores the UTF-8-ness of C<PL_rsfp>, or any SVs returned from
| source filters.  All this could be fixed.

Cheers,
-Jan


<Prev in Thread] Current Thread [Next in Thread>