perl-unicode

DBD:ADO, DBI and UTF-8 (was: dbi and utf8)

2003-12-07 17:30:05
  (Intro: The problem is how to retrieve a unicode
  character from a Jet-Database using DBI and DBD-ADO.)

On Sat, Dec 06, 2003 at 11:47:51PM +0800, Autrijus Tang wrote:
Hence, you'd need to explicitly convert bytestrings returned by
DBI into ustrings, using either utf8::decode, or Encode::decode_utf8.

  Thank you, I have tried this in several ways (listing below).

  It always seems as if the Unicode character 
025A    602     LATIN SMALL LETTER SCHWA WITH HOOK
  contained in the Jet-Database can not be converted
  into anything anymore, because the information has
  already be lost. Possibly due to some driver "converting"
  it to ISO-8859-1.

  Now, I am CC-ing this e-Mail to Steffen Goeldner,
  possibly he knows something about how DBD-ADO is
  handling Unicode characters, which are not part of
  the ISO-8859-1 character set.

  My test script and its output follows. It now shows
  effects of various conversions as suggest by Autrijus.

use utf8;

use strict;
use warnings;

use DBI;
use DBD::ADO;
use Encode;

print "\$DBI::VERSION = "      . $DBI::VERSION . "\n";
print "\$DBD::ADO::VERSION = " . $DBD::ADO::VERSION . "\n";
print "\$Encode::VERSION = "   . $Encode::VERSION . "\n";

sub show
{ my( $text )= @_;
  print "\n\nlength = (" . length( $text ) . ")\n";
  print "text = [" . $text . "]\ncharcodes = ";
  for my $i ( 1 .. length( $text ))
  { print "text at $i =[ " . ord( substr( $text, 0, 1 )). " ]\n"; }}

my $dbh = DBI->connect( "dbi:ADO:Provider=Microsoft." .
"Jet.OLEDB.4.0;Data Source=c:\\tmp.mdb;" );

my $sth = $dbh->prepare( "SELECT tmp FROM tmp" );

$sth->execute();

my $row = $sth->fetchrow_hashref;

my $text;

$text = decode( "utf16", $row->{'tmp'} );     show( $text );
$text = decode( "ucs2", $row->{'tmp'} );      show( $text );
$text = decode( "utf8", $row->{'tmp'} );      show( $text );
$text = Encode::decode_utf8( $row->{'tmp'} ); show( $text );
$text = utf8::decode( $row->{'tmp'} );        show( $text );

$sth->finish(); $dbh->disconnect();

                # this script is started using
                # perl, v5.8.0 built for MSWin32-x86-multi-thread
                # Binary build 804 provided by ActiveState Corp.
                # outputs the following text:

$DBI::VERSION = 1.30
$DBD::ADO::VERSION = 2.81
$Encode::VERSION = 1.83
UTF-16:Partial character at C:/Perl58/lib/Encode.pm line 154.


length = (0)
text = []
UCS-2BE:Partial character at C:/Perl58/lib/Encode.pm line 154.
charcodes =

length = (0)
text = []
charcodes =

length = (1)
text = [?]
charcodes = text at 1 =[ 63 ]


length = (1)
text = [?]
charcodes = text at 1 =[ 63 ]


length = (1)
text = [1]
charcodes = text at 1 =[ 49 ]

<Prev in Thread] Current Thread [Next in Thread>