perl-unicode

Normalizing an unknown filehandle encoding to utf8

2006-10-05 07:53:44
Hi,

 

I'm trying to normalize a filehandle of unknown encoding to UTF8.  There is a 
lot of documentation about changing/converting data formats but nothing I've 
tried works.  Here is my problem and what I tried to do to solve it.

 

I have a form upload which is allowing my clients to upload address books in 
different formats.  Quite a few people are trying to upload LDIF files exported 
from MS Outlook and often there are internationalized characters in the 
windows-1252 character set.  Here is an example of what I mean:

 

Bjørn Stabel

 

I have a file handle for the upload filed (it's an IO::File object) and I 
thought I could force the filehandle to convert itself to UTF-8 'on the fly' 
based on some of the examples and readings I've done in the various PerlIO and 
encoding man pages.  However nothing I do seems to work.  Here's what I've 
tried:

 

(Assume $fh) is the IO::File object

 

binmode ($fh, ":utf8")  or die "trouble $!";

binmode ($fh, ":encoding(utf8)" ) or die "trouble $!";

 

Now I can't use 'encoding(latin1)' because only some files are encoded this way.

 

I run into trouble when I try to insert fields from the addressbook into my 
UTF8 Postgresql database.  Right now I can fix it with encode_utf8(...) but I 
have to use that on every single recovered values that gets inserted into the 
database, so it really seems like an ugly workaround.

 

Isn't there some way to normalize a filehandle of unknown uncoding to UTF-8?  
All the examples I see seem to suggest this is possible, but I just can't make 
it work.

 

Thank you for your suggestions,

 

John Napiorkowski

<Prev in Thread] Current Thread [Next in Thread>
  • Normalizing an unknown filehandle encoding to utf8, Napiorkowski, John <=