I'm surprised that we don't make the old functionality controlled by a
variable of some sort.
i.e.
$UNICODE_ENABLED = 0;
shortcut: $:-( = 0;
where the default is 1. much like: $/
(hmm, I like the multibyte smilely convention, that would be
interesting)
of course it will require a visit to existing functionality but it does
help the forward migration.
btw: I do agree that there are times when a single byte output is
required. for example when parsing/building legacy data files.
=
-----Original Message-----
From: Larry Wall [mailto:larry(_at_)wall(_dot_)org]
Sent: Wednesday, May 05, 2004 12:39 PM
To: perl-unicode(_at_)perl(_dot_)org
Subject: Re: BOM and principle of least surprise
On Wed, May 05, 2004 at 07:47:53PM +0300, Jarkko Hietaniemi wrote:
: We tried this with perl 5.8.0 and the feedback was overwhelmingly
: negative... if people do "print chr 0xff" they do expect one byte,
: not two.
We'll just have to figure out how to retrain those people for
Perl 6. The binary/text distinction is important enough on
filehandles that I'm tempted to resort to different keywords
to open them, just to force people to be specific. 'Course,
that doesn't help with stdout...
A non-serious suggestion is that we pull the same trick and
force people to specify by name whether they're running a
version of Perl that defaults to text or binary, so they have
to invoke #!/usr/bin/tperl or #!/usr/bin/bperl. But
realistically, Perl is going to continue to default to text
mode, which means we have to make it simple to switch to
"binary-filter mode" on stdin and stdout. (Presumably stderr
remains textish.)
Right now, the meaning of "text" is subject to severe
distortions due to legacy issues. But in the long run,
"text" is going to mean Unicode, and that probably means a
UTF-8 file encoding at least in the western world, and
probably something else in the eastern world (but in either
case, the file encoding is hidden behind the filehandle as
far as Perl is concerned). The P5-to-P6 transition is a
reasonable time to bite that bullet. Not sure how much fence
straddling Ponie can do though...
It would be really nice if the OS guys would get their
collective act together and provide APIs to tell programs
what their expectations are, but the next few years are
likely to remain chaotic in that respect, and we're likely to
see many "bandaids" in the form of environment variables,
which are the wrong solution to most API problems. But we
have to aim Perl 6 for the time when things will eventually
settle down. That doesn't mean that we have to make Unicode
the default right away if we decide it's impossible to "bite
the bullet" right now--but it does mean at miminum that there
shouldn't be any default other than Unicode, and if we force
people to be explicit right now, there has to be a way of
simplify the API later when it's reasonable to default to
Unicode. That's probably the transitional form that Perl 5
and Ponie have to go through anyway, even if Perl 6 manages
to force the issue.
Larry