perl-unicode

Re: perl, unicode and databases (mysql)

2002-08-21 13:22:15

----- Original Message -----
From: "Tim Bunce" <Tim(_dot_)Bunce(_at_)pobox(_dot_)com>
To: "Merijn van den Kroonenberg" <merijn(_at_)e-factory(_dot_)nl>
Sent: Tuesday, August 20, 2002 6:35 PM
Subject: Re: perl, unicode and databases (mysql)


On Tue, Aug 20, 2002 at 06:05:32PM +0200, Merijn van den Kroonenberg
wrote:

In general the quote() method should be as aware of utf8 as the
database is.  If the database supports utf8 then the quote() method
should do-the-right-thing or else it's broken and needs fixing.

Well, when i quote it manually:

############################################################
# utf8_quote(string)
sub utf8_quote($){
  my $astring = shift;
  $astring =~ s/(['"\\\0])/\\$1/g;
  return "'".$astring."'";
}# utf8_quote
############################################################

Then i can store and retrieve it just fine. So i guess it supports utf8
;-)

It may just be storing a sequence of bytes. (You can check by using
SQL functions like LENGTH() and SUBSTRING() on it.)

Probably yes, but as long as i don't do any manipulation in the database
like selecting on strings or sorting, it shouldn't matter, right? As long as
the app that retrieves it from the database can work with utf.


Tim.

Oh yeah, one other thing, since Encode::_utf8_on is a internal
function,
wouldn't it be better to use Encode::decode("utf8",$somevar)
instead? As
far
as i can see, it should do exactly the same, but if i am mistaken,
let
me
know :)

Encode::_utf8_on *just* sets the internal uft8 flag bit on the value
which *must* be already valid uft8 (or else you'll get problems
later).

I believe Encode::decode is different (but I've never used either and
could easily not know what I'm talking about :)

from perldoc Encode
 CAVEAT: When you run "$string = decode("utf8",
         $octets)", then $string may not be equal to $octets.
         Though they both contain the same data, the utf8 flag
         for $string is on unless $octets entirely consists of
         ASCII data (or EBCDIC on EBCDIC machines).  See "The
         UTF-8 flag" below.

Thats why i got that idea, so i wondered, cause it also seems to set the
utf8 flag, and leave the data alone. Not sure tho.



<Prev in Thread] Current Thread [Next in Thread>