On Sat, Dec 06, 2003 at 10:30:40AM -0500, David Graff wrote:
It would be worthwhile to use some other mode of access to confirm this.
It's possible that non-utf8 text data are being stored into tables in
some way that you don't expect or don't directly control.
As Etienne has written, it is stored in UCS-2/UTF-16. I could
use UTF-16 in perl as well - after all, I can recode anything
as long as no information is lost. But that is what seems to
happen here.
I now have created a database "c:\tmp.mdb" with a table "tmp"
and a text-field "tmp" containing a single character, namely
the schwar (an IPA-phonetic character not contained in
ISO-8859-1).
The following perl script behaves as if DBI or the ADO driver
would try to convert the text to ISO-8859-1. Because it can
not do so here, it seems to convert it to a question mark.
# this is started using
# perl, v5.8.0 built for MSWin32-x86-multi-thread
# Binary build 804 provided by ActiveState Corp.
use strict;
use warnings;
use DBI;
# error handling and clean-up actions are
# intentionally missing from this ad-hoc script
my $dbh = DBI->connect( "dbi:ADO:Provider=Microsoft." .
"Jet.OLEDB.4.0;Data Source=c:\\tmp.mdb;" );
my $sth = $dbh->prepare( "SELECT tmp FROM tmp" );
$sth->execute();
my $row = $sth->fetchrow_hashref;
my $text = $row->{'tmp'};
print "[" . $text . "]"; # prints "[?]"
print "[" . ord( substr( $text, 0, 1 )). "]"; # prints "[63]"
$sth->finish(); $dbh->disconnect();
Output is:
[?][63]
Data going to or from a database is supposed to pass through DBI without
modification of any sort.
Then may be tha ADO driver or the Jet engine does some
conversion? I have already been looking for a switch to
turn this off.
If you have a utf8-encoded string and put this into a table via an
insert or update operation, that specific byte sequence should be
retrievable from the table later on, via a normal query.
I have inserted a schwar using Microsoft® Access. It appears
on the screen as a visible schwar character. Somethink like:
###
# #
# #
#
######## #
# # ###
# #
# #
###
According to Etienne this should be stored as UCS-2/UTF-16.
when you query for that string, you should contact the author of the
dbi:ADO driver module.
Ok, I will try that, too. Thank you and Etienne.
(I am reading the mailing-list using the web archive.
This might mean, that I read some messages after a delay.)