On Thu, 09 Sep 2010, Michael Ludwig wrote:
What does not work, however, is to have a variable $käse under utf8
and then try to refer to it from inside a "no utf8" block, using either
encoding. Without the utf8 pragma, identifiers are not allowed to have
funny characters. (Yes, it was a stupid exercise.)
The Perl parser is internally not UTF8-clean, so I would recommend not
to use non-ASCII characters in variable names for now, even if it looks
like it mostly works under "utf8".
From perltodo.pod:
| =head2 Properly Unicode safe tokeniser and pads.
|
| The tokeniser isn't actually very UTF-8 clean. C<use utf8;> is a hack -
| variable names are stored in stashes as raw bytes, without the utf-8 flag
| set. The pad API only takes a C<char *> pointer, so that's all bytes too. The
| tokeniser ignores the UTF-8-ness of C<PL_rsfp>, or any SVs returned from
| source filters. All this could be fixed.
Cheers,
-Jan