On Sat, Jun 12, 1999 at 11:29:31AM -0700, Gurusamy Sarathy wrote:
It seems better to have both C<use utf8> and C<use locale> have
a global effect, and add a lexically scoped C<use byte> (or
similar) that will mark places that operate on binary data,
and therefore turn off any encoding related pragmata. What
do you think?
Again, the one thing I don't like about the above is that
code that operates on binary data will silently fail to do the
right thing when the user enables the global switch. I'm
considering doing it iff we can find some way to warn users
that the data may not match the mode in effect.
Do you warn the user that her 'use locale' does nothing if the script
uses any module? If somebody
use glocale; # or locale ':global'
then they know what they are doing: it is their responsibility to
check that all the modules are safified ;-) by use byte - if needed.
The only addition: being locale-sensitive and UTF-8 encoded is a
property of *data*, not of a Perl script. An attempt to handle them by
marking sections of code may be noble, but looks like a lost cause.
The C<use byte> I'm talking about has little to do with data.
Perl *operations* behave *differently* depending on the mode in
effect, so C<use byte> would simply be a way to mark sections
where string operations should affect bytes rather than
How this is related to what I wrote? You cannot make
to utf8 or not to utf8
decision based on some properties of a *program/subroutine*. It is
the properlty of arguments to a subroutine, not of a subroutine.
But finding a way to easily "mark" data (by marking its source?)
has potential, because then we can detect mismatch between data
and operations. We may even be able to piggyback on taint magic
for propagating the type.
I do not follow you: if we can detect a mismatch, why not do "a right
thing" instead of complaining? (Of course, for performance issues
this should be switchable off.)