perl-unicode

Re: dbi and utf8

2003-12-06 11:30:04
On Sat, Dec 06, 2003 at 10:30:40AM -0500, David Graff wrote:
It would be worthwhile to use some other mode of access to confirm this.
It's possible that non-utf8 text data are being stored into tables in
some way that you don't expect or don't directly control.

  As Etienne has written, it is stored in UCS-2/UTF-16. I could
  use UTF-16 in perl as well - after all, I can recode anything
  as long as no information is lost. But that is what seems to
  happen here. 

  I now have created a database "c:\tmp.mdb" with a table "tmp"
  and a text-field "tmp" containing a single character, namely
  the schwar (an IPA-phonetic character not contained in 
  ISO-8859-1).

  The following perl script behaves as if DBI or the ADO driver
  would try to convert the text to ISO-8859-1. Because it can
  not do so here, it seems to convert it to a question mark.

# this is started using
# perl, v5.8.0 built for MSWin32-x86-multi-thread
# Binary build 804 provided by ActiveState Corp.
use strict;
use warnings;
use DBI;
# error handling and clean-up actions are
# intentionally missing from this ad-hoc script
my $dbh = DBI->connect( "dbi:ADO:Provider=Microsoft." .
"Jet.OLEDB.4.0;Data Source=c:\\tmp.mdb;" );
my $sth = $dbh->prepare( "SELECT tmp FROM tmp" );
$sth->execute();
my $row = $sth->fetchrow_hashref;
my $text = $row->{'tmp'};
print "[" . $text . "]"; # prints "[?]"
print "[" . ord( substr( $text, 0, 1 )). "]"; # prints "[63]"
$sth->finish(); $dbh->disconnect();

  Output is:

[?][63]

Data going to or from a database is supposed to pass through DBI without
modification of any sort.

  Then may be tha ADO driver or the Jet engine does some
  conversion? I have already been looking for a switch to
  turn this off.

If you have a utf8-encoded string and put this into a table via an 
insert or update operation, that specific byte sequence should be 
retrievable from the table later on, via a normal query.

  I have inserted a schwar using Microsoft® Access. It appears
  on the screen as a visible schwar character. Somethink like:


   ###
  #   #
 #     #
       #
 ########   #
 #     # ###
 #     #
  #   #
   ###

  According to Etienne this should be stored as UCS-2/UTF-16.

when you query for that string, you should contact the author of the
dbi:ADO driver module.

  Ok, I will try that, too. Thank you and Etienne.

  (I am reading the mailing-list using the web archive.
  This might mean, that I read some messages after a delay.)

<Prev in Thread] Current Thread [Next in Thread>