perl-unicode

Unexpected behaviour of <>

2008-08-27 09:30:38
Hi all,

I had an issue where I have a solution for in the meantime, but that thing 
looks a bit weird. Perhaps someone has an explanation for there is some 
problem behind it which is worth to be investigated further.

My application schedules little "tasks" which are implemented as Perl 
modules. They get a file as input, parse it, connect to databases and 
neigbouring systems and then spit out files. All that is Perl, originally 
5.8.8 but I found 5.10.0 to have the very same behaviour.

When providing the input file these modules simply read from <>, the 
magical <ARGV> filehandle. @ARGV is empty, so it is expected that <> reads 
from STDIN. The "controller" which calls the modules takes care that the 
input file is opened to STDIN.

All went well until I tried to use another input encoding. That boils down 
to the following piece of code (foo.txt has to be provided; ideally it 
should contain something which shows whether the encoding works or not. As 
I am sitting on an ISO-8859 system I used a UTF-8 encoded file which 
contains some characters represented as multi byte sequences, e.g. a 
German umlaut character "ä" which is hex c3 a4 in UTF-8):
   open(STDIN,"<:encoding(UTF-8)","foo.txt");
   print while <STDIN>;         # I do not yet use <> here
   close(STDIN);

This did not work, i.e. the UTF-8 "special" characters were not converted 
to ISO-8859. I then realized that in this case STDIN seems not to be 
auto-closed, so closing it first did the right thing in the first 
instance:
   close(STDIN);
   open(STDIN,"<:encoding(UTF-8)","foo.txt");
   print while <STDIN>;
   close(STDIN);

Then I tried not to read from <STDIN> but from <> and that resulted in a 
BIG surprise: The following code in fact displays the POD source of the 
PerlIO manual page (yes, it's true):
   close(STDIN);
   open(STDIN,"<:encoding(UTF-8)","foo.txt");
   print while <>;
   close(STDIN);

Needless to say that to me that was a new flavour of telling me that I did 
something terribly wrong ;-))

Now to the way I could solve it: Opening the input file on a filehandle of 
its own and then doing a dup (or even fdopen) to STDIN does it right:
   open(IN,"<:encoding(UTF-8)","foo.txt");
   close(STDIN);
   open(STDIN,"<&=IN")  or  die;
   print while <>;
   close(IN);
   close(STDIN);

Even shorter is this one, which achieves the same and somehow even needs 
no explicit close of STDIN:
   open(IN,"<:encoding(UTF-8)","foo.txt");
   @ARGV = ('<&=IN');
   print while <>;
   close(IN);
   close(STDIN);

(That idea comes from the perlopentut manual page where Tom Christiansen 
wonders whether anyone uses that...)

Any comments? 

Christian


---------------------------------------------
Zeppelin Baumaschinen GmbH
Handelsregister - Commercial register: AG München HRB 107767
Sitz - Domicile: D-85748 Garching b. München

Vorsitzender des Aufsichtsrats - Chairman of the Supervisory Board:
Ernst Susanek
Geschäftsführer - Board of Management:
Michael Heidemann (Vorsitzender - Chairman), Christian Dummler
---------------------------------------------

<Prev in Thread] Current Thread [Next in Thread>
  • Unexpected behaviour of <>, Christian Reiber <=