On Jun 18, 2010, at 12:05 AM, John Delacour wrote:
In this case all talk of iso-8859-1 and cp1252 is a red herring. I read
several Italian websites where this same problem is manifest in external
material such as ads. The news page proper is encoded properly and declared
as utf-8 but I imagine the web designers have reckoned that the stuff they
receive from the advertisers is most likely to be received as windows-1252
and convert accordingly rather than bother to verify the encoding. As a
result material that is received as utf-8 will undergo a superfluous encoding.
Here's a way to get the file in question properly encoded:
Yep, that works for me, too. I guess XML::LibXML isn't using Encode in the same
way to decode content, as it returns the string with the characters as
\x{c4}\x{8d}.
Thanks for the help, everyone. I've got my code parsing all my feeds and
emitting a valid UTF-8 feed of its own now.
Best,
David