Hi,
On Monday 16 June 2003 08:37 am, Brigitte Jellinek wrote:
i'm trying to use perl + dbi + dbd::mysql + mysql with unicode.
as far as i can tell i can write a utf8 string into the database,
and get back the same sequence of bits, only now it's a 'classical'
perl-string, not flagged as utf-8.
The crux of the problem is that mysql thinks it knows what it's doing, and is
assuming incoming data is latin1*, and thus storing your bytes as though they
were latin1. When you retrive the string, it then of course tells perl that
the string is latin1-encoded, hence your output.
We're doing the same thing here (storing utf-8 bytes in mysql strings), but
since we have to use perl 5.6, we're using the unpack method of upgrading the
string to utf-8. It seems encode_utf8() should work too, but I haven't had
the pleasure of using the "new" perl 5.8 stuff in production yet, so I don't
know what the problem is there.
What happens if you change your code to use something like the following?
$f = pack('U*', unpack('U0U*', $f)) if defined $f;
# where $f is the data in the field you just pulled
(OT: Actually, we've subclassed DBI, so this upgrade is done transparently.
This make things somewhat nicer; however, SQL operations [such as SORT] still
cannot be relied upon.)
Cheers,
nate
*or somesuch 1-byte encoding; mysql doesn't support utf-8, even in version 4,
despite whatever claims they may make on their website. I'm not bitter. No,
sir.
--
Nathaniel W. Turner
http://www.houseofnate.net/
Tel: +1 508 579 1948 (mobile)