David E. Wheeler schrieb am 15.06.2010 um 22:55 (-0700):
But the curious thing is, when I pull the offending string out of
the RSS and just stick it in a script, Encode knows how to decode it
properly, while XML::LibXML (and my Unicode-aware editors) cannot.
Try passing the parser options as a hash reference:
my $doc = $parser->parse_html_string($str, {encoding => 'utf-8'});
In order to print Unicode text strings (as opposed to octet strings)
correctly to a terminal (UTF-8 or not), add the following line before
the first output:
binmode STDOUT, ':utf8';
But note that STDOUT is global.
Hope this helps!
--
Michael Ludwig