At 00:27 +0100 18/6/10, I wrote:
If I save the file and undo the second decoding I get the proper output
In this case all talk of iso-8859-1 and cp1252 is a red herring. I
read several Italian websites where this same problem is manifest in
external material such as ads. The news page proper is encoded
properly and declared as utf-8 but I imagine the web designers have
reckoned that the stuff they receive from the advertisers is most
likely to be received as windows-1252 and convert accordingly rather
than bother to verify the encoding. As a result material that is
received as utf-8 will undergo a superfluous encoding.
Here's a way to get the file in question properly encoded:
#!/usr/bin/perl
use strict;
use LWP::Simple;
use Encode;
no warnings; # avoid wide character warning
my $tempdir = "/tmp";
my $tempfile = "tempfile";
my $f = "$tempdir/$tempfile";
my $uri="http://pipes.yahoo.com/pipes/pipe.run".
"?Size=Medium&_id=f53b7bed8b88412fab9715a995629722".
"&_render=rss&max=50&nsid=1025993%40N22";
if (getstore($uri, $f)){
open F, $f or die $!;
while (<F>){
my $encoding = find_encoding("utf-8");
my $utf8 = $encoding->decode($_);
print $utf8;
}
close F;
}
unlink $f;
JD