perl-unicode

Re: using Encode module

2003-12-11 06:30:05
Dana Sharvit - M <dana(_dot_)sharvit(_at_)exlibris(_dot_)co(_dot_)il> writes:
Hi ,
I am using the Encode module (perl 5.8)to convert a string from utf8 to big
5.
There is something that I do not understand that I thought you may help
with:
The input to the program is a file that contains a utf8 string,
The encoding works properly only when I use the following code:
use Encode qw(encode decode find_encoding from_to);

$file = shift;
open my $in,  $file;

while ($str = <$in>) {
  chomp;
     open my $in1,  "<:encoding(utf8)", \$str;
         while (<$in1>) {
         $octet = encode("utf8", $_);
         from_to($octet, "utf8","big5");
         print "$octet\n";
         }


}
close $in;

That code reads the file, and decodes the UTF-8 to get characters
(due to :encoding). As they are now characters you re-encode back to 
UTF-8 octets, then use from_to to take those re-encoded octets, 
decode them again (internal to from_to) and then re-encode as big5.

This is a lot of pointless re-encoding!

You should either read the file as octets, or keep the :encoding 
(which is safer with respect to locale effects) and just encode:

open my $in1,  "<:encoding(utf8)", $file;
while (<$in1>) {
  chomp;
  $octet = encode("big5", $_);
  print "$octet\n";
}
close $in;




what I dont understand is two things:
1.why do I need to read the string using IO ( open my $in1,
"<:encoding(utf8)", \$str;)

If your environment expects big5 (as the fact you do raw print suggests)
then something is probably assuming data read from files is big5 and 
not utf8 - so you have to tell it.

2.why do I need to use the encode function before the from_to
function($octet = encode("utf8", $_);)

See above.


I thought that the bellow code will convert correctly but it does not:
use Encode qw(encode decode find_encoding from_to);

$file = shift;
open my $in,  $file;

while ($str = <$in>) {
 chomp($str);
 from_to($octet, "utf8","big5");
 print "$octet\n";

Presumably as from_to modifies string "in place" (ugh) you meant:

$file = shift;
open my $in,  $file;
while (defined($str = <$in>)) {
   from_to($str,"utf8,"big5");
   print $str;
}

That should work unless you have something which causes open to assume 
some encoding.
 

}
close $in;

Thank you
Dana

<Prev in Thread] Current Thread [Next in Thread>