perl-unicode

Re: Variation In Decoding Between Encode and XML::LibXML

2010-06-18 15:41:17
On Jun 18, 2010, at 12:05 AM, John Delacour wrote:

In this case all talk of iso-8859-1 and cp1252 is a red herring.  I read 
several Italian websites where this same problem is manifest in external 
material such as ads.  The news page proper is encoded properly and declared 
as utf-8 but I imagine the web designers have reckoned that the stuff they 
receive from the advertisers is most likely to be received as windows-1252 
and convert accordingly rather than bother to verify the encoding.  As a 
result material that is received as utf-8 will undergo a superfluous encoding.

Here's a way to get the file in question properly encoded:

Yep, that works for me, too. I guess XML::LibXML isn't using Encode in the same 
way to decode content, as it returns the string with the characters as 
\x{c4}\x{8d}.

Thanks for the help, everyone. I've got my code parsing all my feeds and 
emitting a valid UTF-8 feed of its own now.

Best,

David