perl-unicode

Re: utf8 pragma, lexical scope

2010-09-10 01:21:43
Jan Dubois schrieb am 09.09.2010 um 13:13 (-0700):
Without the utf8 pragma, identifiers are not allowed to have
funny characters. (Yes, it was a stupid exercise.)

The Perl parser is internally not UTF8-clean, so I would recommend
not to use non-ASCII characters in variable names for now, even if
it looks like it mostly works under "utf8".

Okay. I can certainly get by without non-ASCII variable names.

From perltodo.pod:

| =head2 Properly Unicode safe tokeniser and pads.
|
| The tokeniser isn't actually very UTF-8 clean. C<use utf8;> is a
| hack - variable names are stored in stashes as raw bytes, without
| the utf-8 flag set. The pad API only takes a C<char *> pointer,
| so that's all bytes too. The tokeniser ignores the UTF-8-ness of
| C<PL_rsfp>, or any SVs returned from source filters.  All this
| could be fixed.

Thanks - I didn't know this doc.
-- 
Michael Ludwig

<Prev in Thread] Current Thread [Next in Thread>