perl-unicode

RE: i know it's utf-8, how can i force perl to see it that way

2003-06-16 07:30:09
I had the same problem, and worked around it by using _utf8_on() from
Encode on the mysql query results.  In my version, it was not exported 
by default, so I added '_utf8_on' to @EXPORT .

However, the Encode documentation states that utf8_on is an internal
function, and "Do not use unless you know that the STRING is well-formed

UTF-8."

Is there a better way to do this?

Also, as a suggestion to the authors/documentors of Encode:  it would
be helpful to have more explanation of (& warnings about) the UTF-8
flag,
how/why it works, functions that manipulate it, and warnings about
common
problems, such as the current one.

Mark

-----Original Message-----
From: Brigitte Jellinek [mailto:bjelli(_at_)horus(_dot_)at] 
Sent: 2003 06 16 8:37
To: perl-unicode(_at_)perl(_dot_)org
Subject: i know it's utf-8, how can i force perl to see it that way



hi!

i'm trying to use perl + dbi + dbd::mysql + mysql with unicode.

as far as i can tell i can write a utf8 string into the database,
and get back the same sequence of bits, only now it's a 'classical'
perl-string, not flagged as utf-8.

the string i write into the db is 6 characters long:
"ABc\N{greek:alpha}\x{00df}\N{cyrillic:e}"


    character           unicode utf8
                      hex     binary

    A                 0041    01000001
    B                 0042    01000010
    c                 0063    01100011
    greep alpha               03B1    1100111010110001
    german scharfes s 00DF    1100001110011111
    cyrrillic e               044D    1101000110001101 


what i get back from the db is

                              binary

    A                         01000001
    B                         01000010
    c                         01100011
    ?                         11001110
    ?                         10110001
    ?                         11000011
    ?                         00111111
    ?                         11010001
    ?                         00111111


I have tried to convert this using 
      $new = decode_utf8( $fromdb );
but all i get is an empty string.  is there
some way to find out *why* this won't decode?

or is my debugging stuff that shows me the bits in the
string just wrong:


sub showbits 
{
    my ($template, $utf, $result, $i);
    $utf =  is_utf8  $_[0];
    $template = $utf ? "U*" : "C*";
    foreach ( unpack($template, $_[0] ) )
    {
        $result .= "\n" ;
        $result .= substr( $_[0], $i, 1 ) . "=" . sprintf 
("%04X", $_) .  "=";
        if ( $utf and $_ > 127) {
                $b = unpack("B*", substr( $_[0], $i, 1 ));
        }
        else {
                $b = unpack("B*", pack("N", $_ ));
        }
        $b =~ s/^0{32}//;  # leading zeros
        $b =~ s/^0{16}//;
        $b =~ s/^0{8}//;
        $result .= $b;
        $i++;
    }
    return $result;
}

-- 
Brigitte        'I never met a chocolate I didnt like'        Jellinek
bjelli(_at_)horus(_dot_)com                         
http://www.horus.com/~bjelli/
http://perlwelt.horus.at http://www.perlmonks.org/index.pl?node=bjelli