I was under the assumption that:
use encoding 'utf8';
was equivalent to:
use utf8; # source in UTF-8
binmode STDIN, ':utf8';
binmode STDOUT, ':utf8;
But that does not seem to be the case. Please consider and run
the following script:
use strict;
use warnings;
# use either (a)
use utf8; binmode STDOUT, ':utf8';
# or (b) to test
#use encoding 'utf8';
my @strings = (
"These should print okay on a UTF-8 terminal",
"disposing of the relevant glyphs:",
"\t\x{041c}\x{0438}\x{0440} - Russian space station", # Мир
"\tKäse - UTF-8 literal",
'---------------',
'Same for these:',
"\tK\x{e4}se - Unicode character escape \\x{e4}",
"\tK\x{00e4}se - \\x{00e4}, same thing",
'---------------',
'These should be double-encoded on a UTF-8 terminal:',
"\tK\x{c3}\x{a4}se - octets \\x{c3}\\x{a4}",
"\tK\xc3\xa4se - octets \\xc3\\xa4",
);
print "$_\n" for @strings;
It seems that (a) and (b) do not handle the \x{....} notation
identically as far as the Latin-1 Supplement (U+0080 to U+00FF)
is concerned. (Tried Perl 5.8.9 and 5.10.0.) What do you think?
--
Michael.Ludwig (#) XING.com