perl-unicode

RE: UTF-16 -> UTF-8

2001-11-21 06:59:05

Dear Rui,

I probably missed the start of this thread where you said that you couldn't
use Perl 5.6. But if you could use Perl 5.6, then something like this would
work:

open(INFILE, "<$ARGV[0]") || die "Can't read $ARGV[0]";
open(OUTFILE, ">$ARGV[1]") || die "Can't write $ARGV[1]";
binmode OUTFILE;

print OUTFILE pack('v', 0xfeff);
while(<INFILE>)
{
    s/\n$/\015\012/o;
    print OUTFILE pack('v*', unpack('U*', $_));
}

close(OUTFILE);
close(INFILE);

Some thoughts on this code, which I use.

1. It is set up to work in the Windows environment, hence the newline tidy ups 
and the use of 'v' for packing.
2. It doesn't support surrogates. But you could get around this by changing the 
key line to something like (this is untested):

$s = $_;
print OUTFILE pack ('v*' map {$_ > 0xFFFF ? (($_ >> 10) + 0xD800, ($_ & 0x3FF) 
+ 0xDC00) : $_} unpack ('U*', $s));

HTH,
Martin

<Prev in Thread] Current Thread [Next in Thread>