Re: Strange "UTF-8" problem


orasnita(_at_)fcc(_dot_)ro said:

The one line script was:
perl -e 'open(I, "lang.lng"); while(<I>) {next if /^[\t ]*#/; next unless
/\w/; s/\s*$//; print;}'

The text file lang.lng was:
line1=mâta
lala=tâta

It is not a UTF-8 encoded file, but  a simple ANSI/Unix end of line file
that contains some special chars in latin1 character set.

The program works fine with Perl 5.8.3 under Red Hat, but it gives me
that error if running it on another system where I have perl 5.8.0.
However, I guess this has nothing to do with the version of perl....


Actually, it might have everything to do with the version.  5.8.0 would use
the current locale setting on a RedHat OS: if the locale was set to
something referring to utf-8, Perl 5.8.0 would assume a default behavior 
that would try to treat every input file as a utf-8 file.

This was soon recognized as a bad idea, and more recent versions will 
always open input files as "raw" (no special character semantics), and you 
have to specify ":utf8" via the open statement or binmode in order to 
interpret the input data as utf-8.

To get your one-liner to behave the same on 5.8.0 as it does on 5.8.1, you 
need to add "use bytes;" -- this is not necessary (but does no harm) when 
running 5.8.3.
-- 
-----------
David Graff                     Linguistic Data Consortium
graff(_at_)ldc(_dot_)upenn(_dot_)edu            3600 Market St., Suite 810
voice: (215) 898-0887           University of Pennsylvania
fax:   (215) 573-2175           Philadelphia, PA 19104
                http://www.ldc.upenn.edu