perl-unicode

Re: encoding(UTF16-LE) on Windows

2011-01-19 04:11:25
Erland Sommarskog schrieb am 17.01.2011 um 13:57 (-0000):
I'm on Windows and I have this small script:

   use strict;
   open F, '>:encoding(UTF-16LE)', "slask2.txt";
   print F "1\n2\n3\n";
   close F;

When I open the output in a hex editor I see

  31 00 0D 0A 00 32 00 0D 0A 00 33 0D 0A 00

In other words (od -c):

    1  \0  \r  \n  \0   2  \0  \r  \n  \0   3  \0  \r  \n  \0

I would expect to see:

  31 00 0D 00 0A 00 32 00 0D 00 0A 00 33 0D 00 0A 00

Guess you would even expect:

    …                                   33 00 OD 00 OA 00

That is, I expect \n to be translated to 0D 00 0A 00, now it is
translated to three bytes.

It looks like a bug to me. I'm getting the same result as you for:

* ActivePerl 5.10.1
* ActivePerl 5.12.1
* Strawberry 5.12.0

All three participants show correspondingly wrong results for UTF-16BE.
And also for UTF-16, which just adds the BOM.

Perl/Cygwin 5.10.1 does fine because its OS is "cygwin", so it doesn't
translate "\n" to CRLF.

-- 
Michael Ludwig