perl-i18n

Encoding of data from a PostgreSQL db using DBI

2003-05-20 15:29:55
Honorable Perl Hackers,

I'm really desperate and I must apologize for posting without lurking. 

My problem is as follows:

I'm retrieving data from a PostgreSQL database, that are encoded in what
PostgreSQL calls UNICODE, which, I assume, means UTF8. At least I'm seeing
two weird characters for the Norwegian letters I want to see if I do
nothing... :-)  

I'm using the really great module Postscript::MailLabels to generate mail
labels on the basis of this data, but Postscript::MailLabels needs Latin1
input. 

So, I need to translate from UTF8 to Latin1. 

I've found many modules on CPAN to do this, but I can't get any of them to
do what I want... I think I've missed something conceptually important
(and I wish I hadn't gotten into this messy project where I get too little
time to sit down and learn things).

I've tried Unicode::MapUTF8, Unicode::Map8 and Unicode::Map, passing the
string, which is UTF8 encoded to the method that I thought world convert
them. Most of the time, the Norwegian characters just disappear, sometimes
the whole string disappears. But perhaps I should somehow declare that the
string _is_ UTF8 before trying to convert it...?

My latest attempt is to use Unicode::String, something like this:
use Unicode::String qw(utf8 utf16 latin1);
Unicode::String->stringify_as("utf8");

[snip lots of other stuff]

        my $us = Unicode::String->new();
        my $tmp = $us->utf8(${$data}{$kid}{'navn'});
        $navn = $us->latin1(${$data}{$kid}{'navn'});

${$data}{$kid}{'navn'} is the string which contains the UTF8-coded data. 
This apparently only removes the Norwegian characters. At least I can't
see them in any of my output. 

What am I doing wrong.

I have also been experimenting with setting the PostgreSQL client to use
an encoding, e.g.:
$rv  = $dbh->do("SET CLIENT_ENCODING TO 'LATIN1';");

This seems to result in 7 bit text, as e.g. ø is converted to x. 

Funnily, I get the same result with 
$rv  = $dbh->do("SET CLIENT_ENCODING TO 'UNICODE';");

Even more strange is that if I do \encoding on the psql command line
(the output from this client in the terminal shows Norwegian letters
correctly), it says SQL_ASCII or something... Huh?

My box is a simple laptop running RH 8.0, with a 2.4.20 Linux kernel, my
Perl installation is therefore v5.8.0.

O, hackers of great wisdom, how do I do this correctly?


Yours Confusedly,

Kjetil
-- 
Kjetil Kjernsmo
Recent astrophysics graduate                  Problems worthy of attack
University of Oslo, Norway            Prove their worth by hitting back
E-mail: kjetikj(_at_)astro(_dot_)uio(_dot_)no                                - 
Piet Hein
Homepage <URL:http://folk.uio.no/kjetikj/>
Webmaster(_at_)skepsis(_dot_)no                            OpenPGP KeyID: 
6A6A0BBC




<Prev in Thread] Current Thread [Next in Thread>
  • Encoding of data from a PostgreSQL db using DBI, Kjetil Kjernsmo <=