perl-unicode

My favorite bug to fix for 5.8.0

2002-03-09 13:01:25
In Markus's lovely http://www.cl.cam.ac.uk/~mgk25/unicode.html document,
he writes:

    On POSIX systems, the selected locale identifies already the encoding
    expected in all input and output files of a process.

Perl currently violates this, and I'm getting very tired very quickly
of having to put things like

    eval {
        binmode IN, ":utf8";
        binmode STDIN, ":utf8";
        binmode STDOUT, ":utf8";
    };

in my programs, despite running in a LANG=en_US.UTF-8 locale with a
UTF-8 aware xterm and a UTF-8 aware editor.  What will it take to fix
that?  Not much, I think.

In the more-difficult-but-oh-so-user-friendly category, it would also
be lovely if someone came up with a dwimmish layer that could recognize
when it isn't getting UTF-8 and attempt autorecognition of other
encodings, perhaps with hints from the locale.  Camel III called it
:any, but maybe :guess would be better documentation.  Then saying
C<use open ":guess"> could just dwim all the opens.  There's arguments
both for and against making that the default.  After all, just because
you've set a UTF-8 locale doesn't actually mean that all the files you
receive are in that format.  It has to be at least easy to turn on
guessing, even if that's not the default.  But if we do want to establish
guessing as a default, then the transition to widespread use of UTF-8
locales is probably our only chance.

Markus, what's your take on this?  Do you think open by default should
try to Do the Right Thing?  I'm trying to balance out the needs of
neophytes with experts here.  Perhaps this is another of those things
that should work differently under C<use strict>.  But it's so
pitifully easy to distinguish UTF-8 from ISO-8859-1 that it seems like
that should almost be mandatory.

But the first step is recognizing UTF-8 locales.

Larry