perl-unicode

RE: BOM and principle of least surprise

2004-05-06 02:30:06
I'm surprised that we don't make the old functionality controlled by a
variable of some sort.

i.e.

$UNICODE_ENABLED = 0;

shortcut:   $:-( = 0;

where the default is 1.   much like:  $/   

(hmm, I like the multibyte smilely convention, that would be
interesting)

of course it will require a visit to existing functionality but it does
help the forward migration.

btw: I do agree that there are times when a single byte output is
required.  for example when parsing/building legacy data files. 
=


-----Original Message-----
From: Larry Wall [mailto:larry(_at_)wall(_dot_)org] 
Sent: Wednesday, May 05, 2004 12:39 PM
To: perl-unicode(_at_)perl(_dot_)org
Subject: Re: BOM and principle of least surprise


On Wed, May 05, 2004 at 07:47:53PM +0300, Jarkko Hietaniemi wrote:
: We tried this with perl 5.8.0 and the feedback was overwhelmingly
: negative...  if people do "print chr 0xff" they do expect one byte,
: not two.

We'll just have to figure out how to retrain those people for 
Perl 6. The binary/text distinction is important enough on 
filehandles that I'm tempted to resort to different keywords 
to open them, just to force people to be specific.  'Course, 
that doesn't help with stdout...

A non-serious suggestion is that we pull the same trick and 
force people to specify by name whether they're running a 
version of Perl that defaults to text or binary, so they have 
to invoke #!/usr/bin/tperl or #!/usr/bin/bperl.  But 
realistically, Perl is going to continue to default to text 
mode, which means we have to make it simple to switch to 
"binary-filter mode" on stdin and stdout. (Presumably stderr 
remains textish.)

Right now, the meaning of "text" is subject to severe 
distortions due to legacy issues.  But in the long run, 
"text" is going to mean Unicode, and that probably means a 
UTF-8 file encoding at least in the western world, and 
probably something else in the eastern world (but in either 
case, the file encoding is hidden behind the filehandle as 
far as Perl is concerned).  The P5-to-P6 transition is a 
reasonable time to bite that bullet.  Not sure how much fence 
straddling Ponie can do though...

It would be really nice if the OS guys would get their 
collective act together and provide APIs to tell programs 
what their expectations are, but the next few years are 
likely to remain chaotic in that respect, and we're likely to 
see many "bandaids" in the form of environment variables, 
which are the wrong solution to most API problems.  But we 
have to aim Perl 6 for the time when things will eventually 
settle down.  That doesn't mean that we have to make Unicode 
the default right away if we decide it's impossible to "bite 
the bullet" right now--but it does mean at miminum that there 
shouldn't be any default other than Unicode, and if we force 
people to be explicit right now, there has to be a way of 
simplify the API later when it's reasonable to default to 
Unicode.  That's probably the transitional form that Perl 5 
and Ponie have to go through anyway, even if Perl 6 manages 
to force the issue.

Larry



<Prev in Thread] Current Thread [Next in Thread>