Re: BOM and principle of least surprise

Nick Ing-Simmons (nick(_at_)ing-simmons(_dot_)net) writes:

Erland Sommarskog <sommar(_at_)algonet(_dot_)se> writes:

I would really expect someone to have done this already, but I see no
reference to such a module. Or layer-directive like "<:use-bom" to open
the file. And then some way to open an output file "same mode as that
handle".


Seems you are the 1st (at least to care) - so in true OpenSource 
spirit you would write the module and contribute it.


Unfortunately my field of expertise is not in the area of C++ programming
or Perl internals. Believe me, you would not want to see my miserable
code entered into the Perl code base. :-)

I guess, that if I want to write a utility which can handle Unicode 
files, that I will implement the file-opening in Perl in some private
module.

Many _programs_ yes. So when you write a perl _program_ you can 
handle it. C++ language doesn't do this for you, why should Perl?
Now there may well be a C++ _library_ which does this, so there 
could be a perl _library_ (module) which did it too.


But Perl is not C++. C++ is a strongly typed language where you use
different functions for 8-bit and Unicode data. Perl is also a higher-
level language that does more work for me. I'd say that it would be
perfectly in the spirit of Perl to magically handle file as ASCII or
Unicode without me having to bother.

It would seem best place to do this would be to change 
the initial layer in Win32 to a new layer (say :bomcrlf).
This layer would get popped on binmode() - fixing above.
It would look at 1st few bytes it got from OS and then if it was 
a BOM push an encoding() layer beneath itself and mutate into 
a :crlf layer with UTF8 flag set.


Yes, that sounds like a good way that would ensure compatibility and
still give me what I want. When is Santa coming to town? :-)

However, that does not really help when the Perl script itself is in
UTF-16 or UTF-8.
 
Anyway, thanks for all the replies. This is not really a big deal for
me at the moment. I was just puzzled by the results of my tests. Since
I working with a module that will support Unicode data, I'm a little
nervous that I will get questions from users about the topic.

-- 
Erland Sommarskog, Stockholm, sommar(_at_)algonet(_dot_)se