perl-unicode

Re: My favorite bug to fix for 5.8.0

2002-03-09 18:04:54
On Sat, Mar 09, 2002 at 11:59:29AM -0800, Larry Wall wrote:
In Markus's lovely http://www.cl.cam.ac.uk/~mgk25/unicode.html document,
he writes:

    On POSIX systems, the selected locale identifies already the encoding
    expected in all input and output files of a process.

Perl currently violates this, and I'm getting very tired very quickly
of having to put things like

    eval {
      binmode IN, ":utf8";
      binmode STDIN, ":utf8";
      binmode STDOUT, ":utf8";
    };

in my programs, despite running in a LANG=en_US.UTF-8 locale with a
UTF-8 aware xterm and a UTF-8 aware editor.  What will it take to fix
that?  Not much, I think.

I started doing this at one point by making Perl to understand the
langinfo(CODESET) thingy (I18N::Langinfo), but at some point I was
somewhat disheartened by the sucky support of langinfo() across the
platforms and walked away in disgust.

However, one can even currently (in 5.7.3) say use open ':locale' and
if your langinfo() returns something matching /utf-?8/i you will get
automagic utf8-fication on your I/O (Not on your STDIN and OUT,
though).  At least, that was the theory and plan.

Markus, what's your take on this?  Do you think open by default should
try to Do the Right Thing?  I'm trying to balance out the needs of
neophytes with experts here.  Perhaps this is another of those things
that should work differently under C<use strict>.  But it's so
pitifully easy to distinguish UTF-8 from ISO-8859-1 that it seems like
that should almost be mandatory.

But the first step is recognizing UTF-8 locales.

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen