perl-unicode

Encode::_utf8_on and output

2003-05-29 22:30:04
On Sat, 18 Jan 2003, Jarkko Hietaniemi wrote:

Now Perl-5.8.1-to-be has been changed to

(1) not to do any implicit UTF-8-ification of any filehandles unless
    explicitly asked to do so (either by the -C command line switch
    or by setting the env var PERL_UTF8_LOCALE to a true value, the switch
    wins if both are present) (and if the locale settings do not indicate
....

Note that the above do not change the fact that if a *programmer* wants
their code to be UTF-8 aware, they need to think about the evil binmode().

Recently, I came across something curious. From this thread, we all know
that perl 5.8.0 does implicit 'UTF-8-ification' when it's run under a
UTF-8 locale and perl 5.8.1 won't. The following script produces
five output files. Under UTF-8 locale and perl 5.8, default.out
has  (U+AC00 U+AC01) in EUC-KR is '0xb0 0xa1 0xb0 0xa2'.

  c2 b0 c2 a1 c2 b0 c2 a2

while bytes.out, binmod.out, encode.out and default2.out have

  b0 a1 b0 a2

What made me curious is default2.out. I'm wondering how setting UTF8
flag on what's an invalid UTF-8 string ($output) with Encode::_utf8_on
effectively made the output filehandle behave as if 'binmode' were set or
'bytes' layer were used. Needless to say, I wouldn't rely on that, but
am interested to know how this happens.

Jungshik

P.S. BTW, is there any way to specify 'CHECK' for 'encoding' layer?

----
#!/usr/bin/perl -w
use Encode;

$input = "\x{ac00}\x{ac01}";
$output = encode("euc-kr", $input,  Encode::FB_PERLQQ);

open $ofh, "> default.out";
print $ofh $output;
close $ofh;

open $ofh, ">:bytes", "bytes.out";
print $ofh $output;
close $ofh;

open $ofh, "> binmod.out";
binmode($ofh);
print $ofh $output;
close $ofh;

open $ofh, "> default2.out";
Encode::_utf8_on($output);
print $ofh $output;
close $ofh;

open $ofh, ">:encoding(euc-kr)", "encode.out";
print $ofh $input;
close $ofh;
---------------


<Prev in Thread] Current Thread [Next in Thread>
  • Encode::_utf8_on and output, Jungshik Shin <=