perl-unicode

use encoding 'utf8' bug/shortcoming?

2005-04-13 15:33:59
I have tried the following script on perl-5.8.[356], and I was wonder if
someone could confirm whether this is a bug or known shortcoming of
use encoding 'utf8';

From the encoding docs:

Implicit upgrading for byte strings

By default, if strings operating under byte semantics and strings with
Unicode character data are concatenated, the new string will be
created by decoding the byte strings as ISO 8859-1 (Latin-1).

The encoding pragma changes this to use the specified encoding instead.

In the case of C< print $x; > the $x string was decoded as latin1 even
though the C< use encoding 'utf8' > pragma is set.  No concatenation
happened here, but it got decoded anyway.

So is the problem that encoding has set STDOUT to utf8 mode, but PerlIO is
not respecting the encoding pragma, or what?

~ John Williams
... still looking for the "just assume everything is utf8" mode ...


======== the test script =======

#!/usr/bin/perl

use utf8;
use encoding 'utf8';

$x = 'hÿper';

use Encode;
Encode::_utf8_off($x);

# prints doubly-encoded utf8: 'h�¿per'
print $x;
print "\n";

# prints 'hÿper' correctly
print "$x\n";





<Prev in Thread] Current Thread [Next in Thread>
  • use encoding 'utf8' bug/shortcoming?, John Williams <=