perl-unicode

Re: Converting string to UTF-16LE

2004-02-29 15:30:06

At 8:58 pm +0000 29/2/04, John Delacour wrote:

Suppose that /tmp/iba.txt contains the text "ibañez" in UCS-2, preceded by the BOM, then this works here (Perl 5.8.3)


use Encode qw/encode decode/;
my $f_16 = qq~/tmp/iba.txt~;
open F16, qq~$f_16~;
my $ucs2 = <F16> ;
my $utf8 = decode("UCS-2BE", $ucs2)  ;
print uc $utf8 ;

I may be making a mountain out of a molehill but it looks as if the BOM is required, so the thing to do would be to prepend the BOM to the string to be processed whether or not it aleady exists. In the script below I write the single character \x00\xF1 to the file and then prepend two BOMs to the contents as though the BOM was already there and I was prepending another, and it works:

use Encode qw/encode decode/;
my $f_16 = qq~/tmp/iba.txt~;
open F16, qq~>$f_16~;
print F16 qq~\x00\xF1~; # small n with tilde, no BOM
close F16;

open F16, qq~$f_16~;
my $BOM = qq~\xFE\xFF~;
my $ucs2 = $BOM . $BOM . <F16> ;
my $utf8 = decode("UCS-2BE", $ucs2)  ;
no warnings;
print uc $utf8 ;

This is certainly an inelegant way of doing things, so I hope someone else has the "right" answer.

JD

<Prev in Thread] Current Thread [Next in Thread>