Re: unicode/locale question

I have a file that is  encoded in utf-8.  When I read it into a Java string
and write it to the database, it gets written properly, but I have problems
when I try to do the same thing in perl.  I can read the file in (and if I
send it out through a cgi, the characters display properly in a browser),
but it does not get written properly to the database.

Is what I am trying to do currently supported? Is it a perl issue (I'm
using perl 5.6), a
DBD:Oracle issue, or neither?

Any thoughts?

Thanks in advance.

Nick Ing-Simmons wrote:

Rami Friedman <rami(_at_)corp(_dot_)airmedia(_dot_)com> writes:

I need to read files written in a variety of charsets (Big5, Arabic,
Hebrew, etc) and write their contents to an oracle database.  This
problem is easy to solve in Java where each feed gets converted to a
ucs-2 string, but, if possible, I need to write the code in perl.  Can
this be done?


I am actively working on this at present.
The development track perl is very close to being able to do it.

The third edition of Programming Perl says locales and
unicode don't mix well yet.


locales are unfortunately rather underspecified.
Knowing the locale name does not tell you what the encoding is.
(If you know of a way please let me know!)

I guess that means I cannot convert from
Big5 to utf-8, for instance.


Work is in process so you can say:

  open(my $fh,"<:encoding(big5)",$name)

then you can read Unicode characters out of the stream.
If you write them to a stream opened thus:

  open(my $oh,">:utf8",$outname);

Then you will have converted the file.
Alternatively there will be mechanisms to get utf8 encoded data
for storing into a database.

Is that correct?  Could I instead rely on
the database driver to convert from the foreign charset to unicode?


Thanks in advance to anyone who might be able to help.

--
Nick Ing-Simmons <nik(_at_)tiuk(_dot_)ti(_dot_)com>
Via, but not speaking for: Texas Instruments Ltd.