perl-unicode

RE: encoding(UTF16-LE) on Windows

2011-01-20 14:45:39
On Thu, 20 Jan 2011, Michael Ludwig wrote:
Erland Sommarskog schrieb am 20.01.2011 um 08:29 (-0000):
"Jan Dubois" (jand(_at_)activestate(_dot_)com) writes:
You need to stack the I/O layers in the right order.  The :encoding()
layer needs to come last (be at the bottom of the stack), *after* the
:crlf layer adds the additional carriage returns.  The way to pop the
default :crlf layer is to start out with the :raw pseudo-layer:

  open(my $fh, ">:raw:encoding(UTF-16LE):crlf", $filename) or die $!;

Certainly not anywhere close to intuitive. And the explanation is even
more muddy. "Needs to come last" - it is smack in the middle. "after
the :crlf layer" - it comes before.

The explanation makes sense; so much so that I overlooked the fact that
this is simply not how it works. Luckily, you were being vigilant. :-)

Would you mind explaining how it is *not* working the way I
described it above?  I realize that the fact that layers work
as a "stack" may be confusing, which is why I annotated "last"
with "bottom of the stack".  Of course the one last on the stack
is the first in the list of layers passed to open() because stacks
are LIFO (last in/first out):

   :raw                - clears the existing :crlf layer from the stack
                         could have used :pop instead, but :raw is more robust

   :encoding(UTF-16LE) - pushes the :encoding layer to the stack.  This makes
                         it the last layer on the stack (and also still the
                         first, for now).

   :crlf               - pushes the :crlf layer on the stack.  :encoding is
                         still the last layer, but :crlf is now the first.

Now when you print a string to the filehandle, then it will be passed
to the top-most layer first (:crlf), which will s/\n/\r\n/g on the
string, and then passes it on to the next lower layer :encoding, which
will do the encoding, and when it reaches the bottom of the stack the
data is actually written to the filesystem.

Files opened on Windows already have the :crlf layer pushed by default,
so you somehow need to get the :encoding layer *below* it.  If
you have it on top, then the crlf substitution happens *after* the
encoding, leading to incorrect data.

Cheers,
-Jan