perl-unicode

Re: BOM and principle of least surprise

2004-05-11 01:30:05
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> writes:
Nick Ing-Simmons wrote:

Larry Wall <larry(_at_)wall(_dot_)org> writes:

Right now, the meaning of "text" is subject to severe distortions
due to legacy issues.  But in the long run, "text" is going to mean
Unicode, and that probably means a UTF-8 file encoding at least in
the western world, 


Microsoft seem to be somewhat focused on some 16-bit form.

This thread started as complaint that perl5 can't read a 
script saved as UCS-2/UTF-16 or whatever Windows uses.

Uh, really?  Perl 5.8+ should be able to do that, automatically.


On 18th March, Erland Sommarskog <sommar(_at_)algonet(_dot_)se> wrote:

Using a thing like utf8 to determine the encoding of character literals
is not a good idea. Suddenly someone saves the file in a different 
encoding, and guess what happens. And as long as Perl does not act
on byte-order marks, how would it be able to read a script that has
been saved in UTF16-LE, which is the normal way of saving Unicode data
on Windows?

I haven't tried this myself...


I thought the issue was about Perl not automatically guessing the
UTF-16 encoding of input data.

That is a related but separate issue.