perl-unicode

Re: Workaround to a unicode bug needed

2010-09-06 15:27:16
Pierre Nugues schrieb am 06.09.2010 um 22:02 (+0200):

2/ The output with "use utf8;"

This pragma tells the interpreter that your script source is in UTF-8.
So it affects the literals in your tr/// list. It does not tell the
interpreter what output encoding to use.

3/ With 
use utf8;
binmode(STDOUT, ':utf8');
I get (this time, the terminal can display the <C2> as a Â. This is
not correct. It strips the accented characters):

Some bytes might have been butchered away by the tr operator.

4/ With binmode(STDOUT, ':utf8') only (Then, there is a combination of
wrongly coded quotes in Latin 1 or Latin 9  that the terminal displays
and accented characters that are shown with their UTF-8 substitutes
interpreted as Latin 1 or Latin 9 characters);

»Tjuvgömmare
!
»
säga

Your output is double-encoded. This is what happens here:

(1) You're reading text encoded as UTF-8 in binary mode.
(2) Consequently, you don't have text in Perl: you have octets.
(3) You're applying some butchery to the octets using the tr operator.
(4) You're outputting the remaining octets encoding them as UTF-8.
(5) You're seeing garbage on the screen.

-- 
Michael Ludwig