At 8:58 pm +0000 29/2/04, John Delacour wrote:
Suppose that /tmp/iba.txt contains the text
"ibañez" in UCS-2, preceded by the BOM, then
this works here (Perl 5.8.3)
use Encode qw/encode decode/;
my $f_16 = qq~/tmp/iba.txt~;
open F16, qq~$f_16~;
my $ucs2 = <F16> ;
my $utf8 = decode("UCS-2BE", $ucs2) ;
print uc $utf8 ;
I may be making a mountain out of a molehill but
it looks as if the BOM is required, so the thing
to do would be to prepend the BOM to the string
to be processed whether or not it aleady exists.
In the script below I write the single character
\x00\xF1 to the file and then prepend two BOMs to
the contents as though the BOM was already there
and I was prepending another, and it works:
use Encode qw/encode decode/;
my $f_16 = qq~/tmp/iba.txt~;
open F16, qq~>$f_16~;
print F16 qq~\x00\xF1~; # small n with tilde, no BOM
close F16;
open F16, qq~$f_16~;
my $BOM = qq~\xFE\xFF~;
my $ucs2 = $BOM . $BOM . <F16> ;
my $utf8 = decode("UCS-2BE", $ucs2) ;
no warnings;
print uc $utf8 ;
This is certainly an inelegant way of doing
things, so I hope someone else has the "right"
answer.
JD