Fellow Perlers,
I'm parsing a lot of XML these days, and came upon a a Yahoo! Pipes feed that
appears to mangle an originating Flickr feed. But the curious thing is, when I
pull the offending string out of the RSS and just stick it in a script, Encode
knows how to decode it properly, while XML::LibXML (and my Unicode-aware
editors) cannot.
The attached script demonstrates. $str has the bogus-looking character".
Encode, however, seems to properly convert it to the "č" in "Laurinavičius" in
the output. XML::LibXML, OTOH, outputs it as "LaurinaviÄius" -- that is,
broken. (If things look truly borked in this email too, please look at the
attached script.)
So my question is, what gives? Is this truly a broken representation of the
character and Encode just figures that out and fixes it? Or is there something
off with my editor and with XML::LibXML.
FWIW, the character looks correct in my editor when I load it from the original
Flickr feed. It's only after processing by Yahoo! Pipes that it comes out
looking mangled.
Any insights would be appreciated.
Best,
David
try.pl
Description: Text Data