Re: i know it's utf-8, how can i force perl to see it that way

Hi,

On Monday 16 June 2003 08:37 am, Brigitte Jellinek wrote:

i'm trying to use perl + dbi + dbd::mysql + mysql with unicode.

as far as i can tell i can write a utf8 string into the database,
and get back the same sequence of bits, only now it's a 'classical'
perl-string, not flagged as utf-8.


The crux of the problem is that mysql thinks it knows what it's doing, and is 
assuming incoming data is latin1*, and thus storing your bytes as though they 
were latin1.  When you retrive the string, it then of course tells perl that 
the string is latin1-encoded, hence your output.

We're doing the same thing here (storing utf-8 bytes in mysql strings), but 
since we have to use perl 5.6, we're using the unpack method of upgrading the 
string to utf-8.  It seems encode_utf8() should work too, but I haven't had 
the pleasure of using the "new" perl 5.8 stuff in production yet, so I don't 
know what the problem is there.

What happens if you change your code to use something like the following?

$f = pack('U*', unpack('U0U*', $f)) if defined $f;  
# where $f is the data in the field you just pulled

(OT: Actually, we've subclassed DBI, so this upgrade is done transparently.  
This make things somewhat nicer; however, SQL operations [such as SORT] still 
cannot be relied upon.)

Cheers,
nate

*or somesuch 1-byte encoding; mysql doesn't support utf-8, even in version 4, 
despite whatever claims they may make on their website.  I'm not bitter.  No, 
sir.

-- 
Nathaniel W. Turner
http://www.houseofnate.net/
Tel: +1 508 579 1948 (mobile)