(Intro: The problem is how to retrieve a unicode
character from a Jet-Database using DBI and DBD-ADO.)
On Sat, Dec 06, 2003 at 11:47:51PM +0800, Autrijus Tang wrote:
Hence, you'd need to explicitly convert bytestrings returned by
DBI into ustrings, using either utf8::decode, or Encode::decode_utf8.
Thank you, I have tried this in several ways (listing below).
It always seems as if the Unicode character
025A 602 LATIN SMALL LETTER SCHWA WITH HOOK
contained in the Jet-Database can not be converted
into anything anymore, because the information has
already be lost. Possibly due to some driver "converting"
it to ISO-8859-1.
Now, I am CC-ing this e-Mail to Steffen Goeldner,
possibly he knows something about how DBD-ADO is
handling Unicode characters, which are not part of
the ISO-8859-1 character set.
My test script and its output follows. It now shows
effects of various conversions as suggest by Autrijus.
use utf8;
use strict;
use warnings;
use DBI;
use DBD::ADO;
use Encode;
print "\$DBI::VERSION = " . $DBI::VERSION . "\n";
print "\$DBD::ADO::VERSION = " . $DBD::ADO::VERSION . "\n";
print "\$Encode::VERSION = " . $Encode::VERSION . "\n";
sub show
{ my( $text )= @_;
print "\n\nlength = (" . length( $text ) . ")\n";
print "text = [" . $text . "]\ncharcodes = ";
for my $i ( 1 .. length( $text ))
{ print "text at $i =[ " . ord( substr( $text, 0, 1 )). " ]\n"; }}
my $dbh = DBI->connect( "dbi:ADO:Provider=Microsoft." .
"Jet.OLEDB.4.0;Data Source=c:\\tmp.mdb;" );
my $sth = $dbh->prepare( "SELECT tmp FROM tmp" );
$sth->execute();
my $row = $sth->fetchrow_hashref;
my $text;
$text = decode( "utf16", $row->{'tmp'} ); show( $text );
$text = decode( "ucs2", $row->{'tmp'} ); show( $text );
$text = decode( "utf8", $row->{'tmp'} ); show( $text );
$text = Encode::decode_utf8( $row->{'tmp'} ); show( $text );
$text = utf8::decode( $row->{'tmp'} ); show( $text );
$sth->finish(); $dbh->disconnect();
# this script is started using
# perl, v5.8.0 built for MSWin32-x86-multi-thread
# Binary build 804 provided by ActiveState Corp.
# outputs the following text:
$DBI::VERSION = 1.30
$DBD::ADO::VERSION = 2.81
$Encode::VERSION = 1.83
UTF-16:Partial character at C:/Perl58/lib/Encode.pm line 154.
length = (0)
text = []
UCS-2BE:Partial character at C:/Perl58/lib/Encode.pm line 154.
charcodes =
length = (0)
text = []
charcodes =
length = (1)
text = [?]
charcodes = text at 1 =[ 63 ]
length = (1)
text = [?]
charcodes = text at 1 =[ 63 ]
length = (1)
text = [1]
charcodes = text at 1 =[ 49 ]