perl-unicode

Re: i know it's utf-8, how can i force perl to see it that way

2003-06-16 10:30:06
On Mon, Jun 16, 2003 at 02:37:23PM +0200, Brigitte Jellinek wrote:

hi!

i'm trying to use perl + dbi + dbd::mysql + mysql with unicode.

as far as i can tell i can write a utf8 string into the database,
and get back the same sequence of bits, only now it's a 'classical'
perl-string, not flagged as utf-8.

the string i write into the db is 6 characters long:
"ABc\N{greek:alpha}\x{00df}\N{cyrillic:e}"

    character           unicode utf8
                      hex     binary

    A                 0041    01000001
    B                 0042    01000010
    c                 0063    01100011
    greep alpha       03B1    11001110 10110001
    german scharfes s 00DF    11000011 10011111
    cyrrillic e       044D    11010001 10001101 

what i get back from the db is

I've reformatted this slightly:

                              binary
    A                         01000001
    B                         01000010
    c                         01100011
                              11001110 10110001
                              11000011 00111111
                              11010001 00111111

The high bit has been lost from some of those bytes.

Probably need to solve that before worrying about flagging the
string as utf8 (for which Encode::_utf8_on(...) is okay).

Right now that'll 'work' but the utf8 bytes have been corrupted.
Perhaps the dbi-users mailing list would be a better place for this.
I'm sure others have been here before.

Tim.

p.s. Extending the DBI spec to cover uft8 is high on my to-do list.