perl-unicode

Re: Variation In Decoding Between Encode and XML::LibXML

2010-06-16 04:35:22
David E. Wheeler schrieb am 15.06.2010 um 22:55 (-0700):

But the curious thing is, when I pull the offending string out of
the RSS and just stick it in a script, Encode knows how to decode it
properly, while XML::LibXML (and my Unicode-aware editors) cannot.

Try passing the parser options as a hash reference:

  my $doc = $parser->parse_html_string($str, {encoding => 'utf-8'});

In order to print Unicode text strings (as opposed to octet strings)
correctly to a terminal (UTF-8 or not), add the following line before
the first output:

  binmode STDOUT, ':utf8';

But note that STDOUT is global.

Hope this helps!
-- 
Michael Ludwig