perl-unicode

RE: encoding(UTF16-LE) on Windows

2011-01-28 18:55:40
On Fri, 21 Jan 2011, Erland Sommarskog wrote:
"Jan Dubois" (jand(_at_)activestate(_dot_)com) writes:
You need to stack the I/O layers in the right order.  The :encoding()
layer needs to come last (be at the bottom of the stack), *after* the
:crlf layer adds the additional carriage returns.  The way to pop the
default :crlf layer is to start out with the :raw pseudo-layer:

  open(my $fh, ">:raw:encoding(UTF-16LE):crlf", $filename) or die $!;

So this works. But this does not:

   use strict;

   open F, '>slask.out';
   binmode(F, ':raw:encoding(UTF16-LE):crlf');
   print F "Alfa\nBeta\nGamma\n";

Looking at the file in a binary editor, I see:

  41 00 6C 00 66 00 61 00  0D 0A 00 42 00 65 00 74
  00 61 00 0D 0A 00 47 00  61 00 6D 00 6D 00 61 00
  0D 0A 00

In total 35 bytes. Which is a very odd number for a UTF16 file.

I've double-checked with Leon, who thinks that this is due to bug 38456:

    http://rt.perl.org/rt3//Public/Bug/Display.html?id=38456

He made a patch to fix the bug, and the patch has been applied to
bleadperl already.  I ran you sample script with 5.13.9 plus his
patch, and it generates a correct 38 bytes file.  I'm not sure
if this change could/should be picked for a 5.12.4 release as
well, but I guess it probably won't.  But 5.14 should be out
in April or May anyways...

It looks like there is still a lot of brokenness lurking in
the internals of the Perl I/O layer implementation. :(

Cheers,
-Jan

<Prev in Thread] Current Thread [Next in Thread>