perl-unicode

Re: Variation In Decoding Between Encode and XML::LibXML

2010-06-16 11:03:11
On Jun 16, 2010, at 12:04 AM, Henning Michael Møller Just wrote:

Hello (loved your PostgreSQL presentation at the most recent OSCON, BTW)

Thanks. Come see my tutorial at OSCON this year, if you can: Test-Driven 
Database Development. :-) Not sure I can make a tutorial as entertaining, alas. 
Perhaps if I bring beer for the audience.

Which editor do you use? When loading the script in Komodo IDE 5.2 the string 
looks broken. Running the script (ActivePerl 5.10.1 on Windows) only the 
second line is correct - the first (no surprise) and third are broken.

Yes, that's how it looks to me in GNU Emacs (compiled from source with cocoa 
bindings).

Loading the file in UltraEdit-32 13.20+3, set to not convert the script on 
loading, it becomes obvious that what should have been one character is 
represented by 4 bytes, \xC3 \x84 \xC2 \x8D, which modern editors would 
probably show as 2 characters and as broken.

Right.

It looks to me like the string is being displayed as a byte representation of 
the characters, if that makes sense. My english isn't perfect :-/ and what I 
am trying to say is that this is problem that I am quite familiar with. It 
happens whenever the source and the reader do not agree on whether a string 
is encoded in utf-8 or not.

Apparently Encode fixes the incorrect string which is nice. The interesting 
thing is, where should this be fixed? If it's at Yahoo! Pipes you'll probably 
have to use Encode as a work-around for some time...

Yes.

Best,

David