Hi all,
I had an issue where I have a solution for in the meantime, but that thing
looks a bit weird. Perhaps someone has an explanation for there is some
problem behind it which is worth to be investigated further.
My application schedules little "tasks" which are implemented as Perl
modules. They get a file as input, parse it, connect to databases and
neigbouring systems and then spit out files. All that is Perl, originally
5.8.8 but I found 5.10.0 to have the very same behaviour.
When providing the input file these modules simply read from <>, the
magical <ARGV> filehandle. @ARGV is empty, so it is expected that <> reads
from STDIN. The "controller" which calls the modules takes care that the
input file is opened to STDIN.
All went well until I tried to use another input encoding. That boils down
to the following piece of code (foo.txt has to be provided; ideally it
should contain something which shows whether the encoding works or not. As
I am sitting on an ISO-8859 system I used a UTF-8 encoded file which
contains some characters represented as multi byte sequences, e.g. a
German umlaut character "ä" which is hex c3 a4 in UTF-8):
open(STDIN,"<:encoding(UTF-8)","foo.txt");
print while <STDIN>; # I do not yet use <> here
close(STDIN);
This did not work, i.e. the UTF-8 "special" characters were not converted
to ISO-8859. I then realized that in this case STDIN seems not to be
auto-closed, so closing it first did the right thing in the first
instance:
close(STDIN);
open(STDIN,"<:encoding(UTF-8)","foo.txt");
print while <STDIN>;
close(STDIN);
Then I tried not to read from <STDIN> but from <> and that resulted in a
BIG surprise: The following code in fact displays the POD source of the
PerlIO manual page (yes, it's true):
close(STDIN);
open(STDIN,"<:encoding(UTF-8)","foo.txt");
print while <>;
close(STDIN);
Needless to say that to me that was a new flavour of telling me that I did
something terribly wrong ;-))
Now to the way I could solve it: Opening the input file on a filehandle of
its own and then doing a dup (or even fdopen) to STDIN does it right:
open(IN,"<:encoding(UTF-8)","foo.txt");
close(STDIN);
open(STDIN,"<&=IN") or die;
print while <>;
close(IN);
close(STDIN);
Even shorter is this one, which achieves the same and somehow even needs
no explicit close of STDIN:
open(IN,"<:encoding(UTF-8)","foo.txt");
@ARGV = ('<&=IN');
print while <>;
close(IN);
close(STDIN);
(That idea comes from the perlopentut manual page where Tom Christiansen
wonders whether anyone uses that...)
Any comments?
Christian
---------------------------------------------
Zeppelin Baumaschinen GmbH
Handelsregister - Commercial register: AG München HRB 107767
Sitz - Domicile: D-85748 Garching b. München
Vorsitzender des Aufsichtsrats - Chairman of the Supervisory Board:
Ernst Susanek
Geschäftsführer - Board of Management:
Michael Heidemann (Vorsitzender - Chairman), Christian Dummler
---------------------------------------------