Pierre Nugues schrieb am 06.09.2010 um 22:02 (+0200):
2/ The output with "use utf8;"
This pragma tells the interpreter that your script source is in UTF-8.
So it affects the literals in your tr/// list. It does not tell the
interpreter what output encoding to use.
3/ With
use utf8;
binmode(STDOUT, ':utf8');
I get (this time, the terminal can display the <C2> as a Â. This is
not correct. It strips the accented characters):
Some bytes might have been butchered away by the tr operator.
4/ With binmode(STDOUT, ':utf8') only (Then, there is a combination of
wrongly coded quotes in Latin 1 or Latin 9 that the terminal displays
and accented characters that are shown with their UTF-8 substitutes
interpreted as Latin 1 or Latin 9 characters);
»Tjuvgömmare
!
»
säga
Your output is double-encoded. This is what happens here:
(1) You're reading text encoded as UTF-8 in binary mode.
(2) Consequently, you don't have text in Perl: you have octets.
(3) You're applying some butchery to the octets using the tr operator.
(4) You're outputting the remaining octets encoding them as UTF-8.
(5) You're seeing garbage on the screen.
--
Michael Ludwig