Dear Martin,
I can use perl 5.6. In fact I'm using it. I thank you for your code, but I need
to write the
converted words in UTF-16 to a database and not to a text file. We were using
the text file for
output only to see if the conversion was being properly done. But our true
objective was (it still
is) to read a text in UTF-8 and parse it (for which Perl seems to be the best
option) and write the
parts resulting from the parse to a database (Access for starts and then to MS
SQL Server).
Nevertheless thank you for your suggestions and your code.
Best regards.
Rui Ribeiro
Dear Rui,
I probably missed the start of this thread where you said that you couldn't
use Perl 5.6. But if you could use Perl 5.6, then something like this would
work:
open(INFILE, "<$ARGV[0]") || die "Can't read $ARGV[0]";
open(OUTFILE, ">$ARGV[1]") || die "Can't write $ARGV[1]";
binmode OUTFILE;
print OUTFILE pack('v', 0xfeff);
while(<INFILE>)
{
s/\n$/\015\012/o;
print OUTFILE pack('v*', unpack('U*', $_));
}
close(OUTFILE);
close(INFILE);
Some thoughts on this code, which I use.
1. It is set up to work in the Windows environment, hence the newline tidy
ups and the
use of 'v' for packing.
2. It doesn't support surrogates. But you could get around this by changing
the key line
to something like (this is untested):
$s = $_;
print OUTFILE pack ('v*' map {$_ > 0xFFFF ? (($_ >> 10) + 0xD800, ($_ &
0x3FF) + 0xDC00)
: $_} unpack ('U*', $s));
HTH,
Martin