perl-unicode

RE: UTF-16 -> UTF-8

2001-11-21 15:08:59
Dear Martin,

I can use perl 5.6. In fact I'm using it. I thank you for your code, but I need 
to write the
converted words in UTF-16 to a database and not to a text file. We were using 
the text file for
output only to see if the conversion was being properly done. But our true 
objective was (it still
is) to read a text in UTF-8 and parse it (for which Perl seems to be the best 
option) and write the
parts resulting from the parse to a database (Access for starts and then to MS 
SQL Server).

Nevertheless thank you for your suggestions and your code.

Best regards.

Rui Ribeiro

Dear Rui,

I probably missed the start of this thread where you said that you couldn't
use Perl 5.6. But if you could use Perl 5.6, then something like this would
work:

open(INFILE, "<$ARGV[0]") || die "Can't read $ARGV[0]";
open(OUTFILE, ">$ARGV[1]") || die "Can't write $ARGV[1]";
binmode OUTFILE;

print OUTFILE pack('v', 0xfeff);
while(<INFILE>)
{
    s/\n$/\015\012/o;
    print OUTFILE pack('v*', unpack('U*', $_));
}

close(OUTFILE);
close(INFILE);

Some thoughts on this code, which I use.

1. It is set up to work in the Windows environment, hence the newline tidy 
ups and the
use of 'v' for packing.
2. It doesn't support surrogates. But you could get around this by changing 
the key line
to something like (this is untested):

$s = $_;
print OUTFILE pack ('v*' map {$_ > 0xFFFF ? (($_ >> 10) + 0xD800, ($_ & 
0x3FF) + 0xDC00)
: $_} unpack ('U*', $s));

HTH,
Martin



<Prev in Thread] Current Thread [Next in Thread>