Erland Sommarskog <sommar(_at_)algonet(_dot_)se> writes:
Nick Ing-Simmons (nick(_at_)ing-simmons(_dot_)net) writes:
Erland Sommarskog <sommar(_at_)algonet(_dot_)se> writes:
I would really expect someone to have done this already, but I see no
reference to such a module. Or layer-directive like "<:use-bom" to open
the file. And then some way to open an output file "same mode as that
handle".
Seems you are the 1st (at least to care) - so in true OpenSource
spirit you would write the module and contribute it.
Unfortunately my field of expertise is not in the area of C++ programming
or Perl internals. Believe me, you would not want to see my miserable
code entered into the Perl code base. :-)
Well you only learn by trying - but that is your choice.
I guess, that if I want to write a utility which can handle Unicode
files, that I will implement the file-opening in Perl in some private
module.
That would be a resonable way to prototype stuff for core anyway.
With perl5.7+'s "layers" it should be possible to do this as module.
(Which was at least part of motivation for inventing them.)
Many _programs_ yes. So when you write a perl _program_ you can
handle it. C++ language doesn't do this for you, why should Perl?
Now there may well be a C++ _library_ which does this, so there
could be a perl _library_ (module) which did it too.
But Perl is not C++. C++ is a strongly typed language where you use
different functions for 8-bit and Unicode data. Perl is also a higher-
level language that does more work for me.
But there is a limit - or there would be just one perl program:
#!/usr/bin/perl
exit(do_what_I_mean(@ARGV));
I'd say that it would be
perfectly in the spirit of Perl to magically handle file as ASCII or
Unicode without me having to bother.
Agreed - but magic doesn't create itself.
It would seem best place to do this would be to change
the initial layer in Win32 to a new layer (say :bomcrlf).
This layer would get popped on binmode() - fixing above.
It would look at 1st few bytes it got from OS and then if it was
a BOM push an encoding() layer beneath itself and mutate into
a :crlf layer with UTF8 flag set.
Yes, that sounds like a good way that would ensure compatibility and
still give me what I want. When is Santa coming to town? :-)
Implied timescale sounds viable ;-)
However, that does not really help when the Perl script itself is in
UTF-16 or UTF-8.
Yes it does - I _think_ one or more of
perl -MWin32BOM UTF-16_script
or
set PERL5OPT -MWin32BOM
or
set PERLIO bomcrlf
(with magical autoload)
could be made to work.
If it happens in core-perl it can certainly work.
Anyway, thanks for all the replies. This is not really a big deal for
me at the moment. I was just puzzled by the results of my tests. Since
I working with a module that will support Unicode data, I'm a little
nervous that I will get questions from users about the topic.