Re: perl, unicode and databases (mysql)


----- Original Message -----
From: "Tim Bunce" <Tim(_dot_)Bunce(_at_)pobox(_dot_)com>
To: "Merijn van den Kroonenberg" <merijn(_at_)e-factory(_dot_)nl>
Sent: Tuesday, August 20, 2002 6:35 PM
Subject: Re: perl, unicode and databases (mysql)

On Tue, Aug 20, 2002 at 06:05:32PM +0200, Merijn van den Kroonenberg

wrote:

In general the quote() method should be as aware of utf8 as the
database is.  If the database supports utf8 then the quote() method
should do-the-right-thing or else it's broken and needs fixing.


Well, when i quote it manually:

############################################################
# utf8_quote(string)
sub utf8_quote($){
  my $astring = shift;
  $astring =~ s/(['"\\\0])/\\$1/g;
  return "'".$astring."'";
}# utf8_quote
############################################################

Then i can store and retrieve it just fine. So i guess it supports utf8

;-)


It may just be storing a sequence of bytes. (You can check by using
SQL functions like LENGTH() and SUBSTRING() on it.)


Probably yes, but as long as i don't do any manipulation in the database
like selecting on strings or sorting, it shouldn't matter, right? As long as
the app that retrieves it from the database can work with utf.


Tim.

Oh yeah, one other thing, since Encode::_utf8_on is a internal

function,

wouldn't it be better to use Encode::decode("utf8",$somevar)

instead? As

far

as i can see, it should do exactly the same, but if i am mistaken,

let

me

know :)


Encode::_utf8_on *just* sets the internal uft8 flag bit on the value
which *must* be already valid uft8 (or else you'll get problems

later).


I believe Encode::decode is different (but I've never used either and
could easily not know what I'm talking about :)


from perldoc Encode
 CAVEAT: When you run "$string = decode("utf8",
         $octets)", then $string may not be equal to $octets.
         Though they both contain the same data, the utf8 flag
         for $string is on unless $octets entirely consists of
         ASCII data (or EBCDIC on EBCDIC machines).  See "The
         UTF-8 flag" below.

Thats why i got that idea, so i wondered, cause it also seems to set the
utf8 flag, and leave the data alone. Not sure tho.