Re: DBI and UTF-8


ram(_at_)zedat(_dot_)fu-berlin(_dot_)de said:

  As far as I know, the data base engine stores text using UTF-8.
  ...


It would be worthwhile to use some other mode of access to confirm this.
It's possible that non-utf8 text data are being stored into tables in
some way that you don't expect or don't directly control.  If some
person or process is inserting non-utf8 data into the database, it's
very unlikely that the database engine itself is doing anything to alter
the data (e.g. to convert it to utf8) -- database engines don't do that.

To say that it "stores text using UTF-8" would simply mean that it 
handles character data types in a manner that is "8-bit-clean" -- it 
won't screw-up or alter characters that happen to have the high-bit 
set, and when queried, will always return exactly what was inserted.

  Now it seems as if the texts I get from DBI were encoded
  with ISO-8859-1. Could it be possible that DBI is converting
  the UTF-8 obtained from the data base to ISO-8859-1?
  Possibly it considers ISO-8859-1 to be the "default client
  charset"?  ...


I'm not personally familiar with the DBI source code, but I believe any
sort of conversion or alteration of data content by DBI should be quite
impossible (unless there is a bug in the driver for a given RDB engine).
Data going to or from a database is supposed to pass through DBI without
modification of any sort.

  How can I get the utf-8 text stored in the data base?


If you have a utf8-encoded string and put this into a table via an 
insert or update operation, that specific byte sequence should be 
retrievable from the table later on, via a normal query.

If you are encountering a situation where you are specifically inserting
a utf8 character string, and are then getting back something different
when you query for that string, you should contact the author of the
dbi:ADO driver module.  Again, it will be helpful to use other methods
of access to the database so that you can get a better idea of where the
data corruption is happening.

        Dave Graff