perl-unicode

RE: encoding(UTF16-LE) on Windows

2011-01-21 07:42:08
"Jan Dubois" (jand(_at_)activestate(_dot_)com) writes:
Now when you print a string to the filehandle, then it will be passed
to the top-most layer first (:crlf), which will s/\n/\r\n/g on the
string, and then passes it on to the next lower layer :encoding, which
will do the encoding, and when it reaches the bottom of the stack the
data is actually written to the filesystem.

Files opened on Windows already have the :crlf layer pushed by default,
so you somehow need to get the :encoding layer *below* it.  If
you have it on top, then the crlf substitution happens *after* the
encoding, leading to incorrect data.
 
There is still one thing that is not clear to me. The incorrect end-of-line
was

  0D 00 0A

But the way you describe it, I would expect it to be 

  0D 0A 00

That is, first the string is encoded in UTF-16LE and the newline gets
expanded from 0A to 0A 00. 

Next, the crlf layer jumps in and blindly adds a carriage return, but 
somehow it does manage to get the \r character correct nevertheless, but 
loses the high byte of the \n.

-- 
Erland Sommarskog, Stockholm, esquel(_at_)sommarskog(_dot_)se