Re: Workaround to a unicode bug needed

Pierre Nugues schrieb am 06.09.2010 um 22:02 (+0200):

2/ The output with "use utf8;"


This pragma tells the interpreter that your script source is in UTF-8.
So it affects the literals in your tr/// list. It does not tell the
interpreter what output encoding to use.

3/ With 
use utf8;
binmode(STDOUT, ':utf8');
I get (this time, the terminal can display the <C2> as a Â. This is
not correct. It strips the accented characters):


Some bytes might have been butchered away by the tr operator.

4/ With binmode(STDOUT, ':utf8') only (Then, there is a combination of
wrongly coded quotes in Latin 1 or Latin 9  that the terminal displays
and accented characters that are shown with their UTF-8 substitutes
interpreted as Latin 1 or Latin 9 characters);

»TjuvgÃ¶mmare
!
»
sÃ¤ga


Your output is double-encoded. This is what happens here:

(1) You're reading text encoded as UTF-8 in binary mode.
(2) Consequently, you don't have text in Perl: you have octets.
(3) You're applying some butchery to the octets using the tr operator.
(4) You're outputting the remaining octets encoding them as UTF-8.
(5) You're seeing garbage on the screen.

-- 
Michael Ludwig

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re: Workaround to a unicode bug needed, Michael Ludwig

Next by Date:

Re: Workaround to a unicode bug needed, Pierre Nugues

Previous by Thread:

Re: Workaround to a unicode bug needed, Pierre Nugues

Next by Thread:

Re: Workaround to a unicode bug needed, Michael Ludwig

Indexes:

[Date] [Thread] [Top] [All Lists]