Hi,
I'm trying to normalize a filehandle of unknown encoding to UTF8. There is a
lot of documentation about changing/converting data formats but nothing I've
tried works. Here is my problem and what I tried to do to solve it.
I have a form upload which is allowing my clients to upload address books in
different formats. Quite a few people are trying to upload LDIF files exported
from MS Outlook and often there are internationalized characters in the
windows-1252 character set. Here is an example of what I mean:
Bjørn Stabel
I have a file handle for the upload filed (it's an IO::File object) and I
thought I could force the filehandle to convert itself to UTF-8 'on the fly'
based on some of the examples and readings I've done in the various PerlIO and
encoding man pages. However nothing I do seems to work. Here's what I've
tried:
(Assume $fh) is the IO::File object
binmode ($fh, ":utf8") or die "trouble $!";
binmode ($fh, ":encoding(utf8)" ) or die "trouble $!";
Now I can't use 'encoding(latin1)' because only some files are encoded this way.
I run into trouble when I try to insert fields from the addressbook into my
UTF8 Postgresql database. Right now I can fix it with encode_utf8(...) but I
have to use that on every single recovered values that gets inserted into the
database, so it really seems like an ugly workaround.
Isn't there some way to normalize a filehandle of unknown uncoding to UTF-8?
All the examples I see seem to suggest this is possible, but I just can't make
it work.
Thank you for your suggestions,
John Napiorkowski