RE: utf8 pragma, lexical scope

On Thu, 09 Sep 2010, Michael Ludwig wrote:


What does not work, however, is to have a variable $käse under utf8
and then try to refer to it from inside a "no utf8" block, using either
encoding. Without the utf8 pragma, identifiers are not allowed to have
funny characters. (Yes, it was a stupid exercise.)


The Perl parser is internally not UTF8-clean, so I would recommend not
to use non-ASCII characters in variable names for now, even if it looks
like it mostly works under "utf8".

From perltodo.pod:


| =head2 Properly Unicode safe tokeniser and pads.
|
| The tokeniser isn't actually very UTF-8 clean. C<use utf8;> is a hack -
| variable names are stored in stashes as raw bytes, without the utf-8 flag
| set. The pad API only takes a C<char *> pointer, so that's all bytes too. The
| tokeniser ignores the UTF-8-ness of C<PL_rsfp>, or any SVs returned from
| source filters.  All this could be fixed.

Cheers,
-Jan

Previous by Date:	utf8 pragma, lexical scope, Michael Ludwig
Next by Date:	Re: utf8 pragma, lexical scope, 'Michael Ludwig'
Previous by Thread:	utf8 pragma, lexical scope, Michael Ludwig
Next by Thread:	Re: utf8 pragma, lexical scope, 'Michael Ludwig'
Indexes:	[Date] [Thread] [Top] [All Lists]