Re: BOM and principle of least surprise

Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> writes:

Nick Ing-Simmons wrote:

Larry Wall <larry(_at_)wall(_dot_)org> writes:

Right now, the meaning of "text" is subject to severe distortions
due to legacy issues.  But in the long run, "text" is going to mean
Unicode, and that probably means a UTF-8 file encoding at least in
the western world,



Microsoft seem to be somewhat focused on some 16-bit form.

This thread started as complaint that perl5 can't read a 
script saved as UCS-2/UTF-16 or whatever Windows uses.


Uh, really?  Perl 5.8+ should be able to do that, automatically.



On 18th March, Erland Sommarskog <sommar(_at_)algonet(_dot_)se> wrote:


Using a thing like utf8 to determine the encoding of character literals
is not a good idea. Suddenly someone saves the file in a different 
encoding, and guess what happens. And as long as Perl does not act
on byte-order marks, how would it be able to read a script that has
been saved in UTF16-LE, which is the normal way of saving Unicode data
on Windows?


I haven't tried this myself...


I thought the issue was about Perl not automatically guessing the
UTF-16 encoding of input data.


That is a related but separate issue.

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re: BOM and principle of least surprise, Larry Wall

Next by Date:

Re: utf8, japanese, web-pages, the horror, the horror..., Nick Ing-Simmons

Previous by Thread:

Re: BOM and principle of least surprise, Jarkko Hietaniemi

Next by Thread:

Re: BOM and principle of least surprise, Jarkko Hietaniemi

Indexes:

[Date] [Thread] [Top] [All Lists]