perl-unicode

Re: encoding(UTF16-LE) on Windows

2011-01-20 08:51:33
Erland Sommarskog schrieb am 20.01.2011 um 08:29 (-0000):
"Jan Dubois" (jand(_at_)activestate(_dot_)com) writes:
You need to stack the I/O layers in the right order.  The :encoding()
layer needs to come last (be at the bottom of the stack), *after* the
:crlf layer adds the additional carriage returns.  The way to pop the
default :crlf layer is to start out with the :raw pseudo-layer: 

  open(my $fh, ">:raw:encoding(UTF-16LE):crlf", $filename) or die $!;

Certainly not anywhere close to intuitive. And the explanation is even
more muddy. "Needs to come last" - it is smack in the middle. "after
the :crlf layer" - it comes before.

The explanation makes sense; so much so that I overlooked the fact that
this is simply not how it works. Luckily, you were being vigilant. :-)

What I can imagine is that handling the logical entity \n is a some sort
of a post-processing step, which would explain why it needs to come last.

Here's a short demo script to show various layer combinations and how
they go wrong:

          \,,,/
          (o o)
------oOOo-(_)-oOOo------
use strict;

my $str = "1\n2\n3\n";  # string to print
my $fno = 1;            # counter for filenames

sub out {
  my $fn = sprintf 'u%02u-%s.txt', $fno++, (join '-', @_) || 'NONE';
  my $layers = join '', map ":$_", @_;
  printf STDERR "%30s => %-40s\n", $layers, $fn;
  open my $fh, ">$layers", $fn or die "open $fn: $!";
  print $fh $str;
  close $fh;
}

my $e = 'encoding(UTF-16LE)';
my $r = 'raw';
my $n = 'crlf';

out;            # default layers
out $r;         # reset default layers
out $r, $n;     # same as default on Windows
out $n, $r;     # :raw at the end resets *all* layers
out $e, $r;     # ditto
out $n, $e, $r; # ditto
out $e, $n, $r; # ditto
out $r, $e, $n; # appears illogical, but correct result
out $r, $n, $e; # appears logical, but wrong result
out $e, $n;
out $n, $e;
out $n, $r, $e; # :crlf reset

-- 
Michael Ludwig