perl-unicode

RE: encoding(UTF16-LE) on Windows

2011-01-19 13:08:40
On Wed, 19 Jan 2011, Michael Ludwig wrote:
Erland Sommarskog schrieb am 17.01.2011 um 13:57 (-0000):
I'm on Windows and I have this small script:

   use strict;
   open F, '>:encoding(UTF-16LE)', "slask2.txt";
   print F "1\n2\n3\n";
   close F;

When I open the output in a hex editor I see

  31 00 0D 0A 00 32 00 0D 0A 00 33 0D 0A 00


It looks like a bug to me. I'm getting the same result as you for:

* ActivePerl 5.10.1
* ActivePerl 5.12.1
* Strawberry 5.12.0

All three participants show correspondingly wrong results for UTF-16BE.
And also for UTF-16, which just adds the BOM.

Perl/Cygwin 5.10.1 does fine because its OS is "cygwin", so it doesn't
translate "\n" to CRLF.

You need to stack the I/O layers in the right order.  The :encoding() layer
needs to come last (be at the bottom of the stack), *after* the :crlf layer
adds the additional carriage returns.  The way to pop the default :crlf
layer is to start out with the :raw pseudo-layer:

  open(my $fh, ">:raw:encoding(UTF-16LE):crlf", $filename) or die $!;

Cheers,
-Jan