perl-unicode

UTF-16, Perl and Microsoft...

2003-04-03 12:30:04
Hello Everybody,

I want to use Perl 5.8.0 to process XML files generated by a web
application using Microsoft technology (W2K Server, the application is
written in ASP, I guess).

These files must be UTF-16LE encoded, the first two bytes of the file are
FF EE, which means little-endian UTF-16.

I only need to do kind of a search-replace within these files, therefore I
must be able to read and write UTF-16LE correctly.

My problem is the UTF encoding. To check whether the encoding works with
the files generated by the mentioned web application, I open the file as
UTF-16LE and save it immediately as UTF-16LE, using another file name. I
get somehow corrupt output, which can not be imported by this MS
application. Here is the script I tried to use:

$infile = "infile.xml";
$outfile = "outfile.xml";

open my $in,  "<:encoding(UTF-16LE)", $infile  or die;
open my $out, ">:encoding(UTF-16LE)", $outfile or die;

while (<$in>) {
        print $out $_;
}

close ($out);
close ($in);

What am I doing wrong? If the encoding would work correctly, this script
should just copy infile.xml to outfile.xml.
Is MS using again a proprietary "own" UTF-16 encoding?
How can I solve this problem?

Thank you very much for any help!

matthias




<Prev in Thread] Current Thread [Next in Thread>