perl-unicode

RE: UTF-16 -> UTF-8

2001-11-21 08:13:21
Philip,

It did work, although partially. Right now I can read from a UTF-8 file and 
write to a UTF-16 file without problems. Finally I can
read the contents of the UTF-16 in Notepad without a problem. I can't read in 
Word though, but I think this as to do with Word not
recognizing the text file as an encoded one.
Still can't write to the BD though. The append SQL instruction has no effect.

So that you know what we've been doing, i'm sending the code and the input text 
file (it just has four chars upper and lowercase i
and u both with tilde (~) chars. I need these because I need to process some 
medieval Portuguese texts that include these chars.

The input file is sent as an attachment.
Regards.

Rui
Here goes the code:

# Teste de acesso a uma BD Access a partir de Perl
# (leitura de um ficheiro e escrita para BD)

use Win32::OLE;
use Win32::OLE::Const 'Microsoft ActiveX Data Objects';

use Unicode::String qw(utf8 latin1);

open(FICH1,"fich1.txt")||die"Nao foi possivel abrir o ficheiro fich1.txt $!";
open(FICH3,">fich3.txt")||die"Nao foi possivel abrir o ficheiro fich3.txt $!";

$constr = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=c:\\Shared\\MyDB.mdb";

$conn = Win32::OLE->new('ADODB.Connection') || die ("Bolas, que ja morri..");
$conn->open($constr);

while (<FICH1>) {
        chomp($_);
        $palavra_utf8=utf8($_);
        $palavra_utf8->byteswap;
        $palavra_utf16=$palavra_utf8->utf16;
        $sql =  "INSERT INTO Tipo_Referencia ( Descricao ) SELECT VALUES 
('$palavra_utf16');";
        print FICH3 $palavra_utf16;
        $conn->execute($sql,,,adExecuteNoRecords);

}

$conn->close;


close(FICH1);
close(FICH3);

-----Original Message-----
From: Philip Newton [mailto:Philip(_dot_)Newton(_at_)gmx(_dot_)net]
Sent: quarta-feira, 21 de Novembro de 2001 7:14
To: Rui Ribeiro
Cc: perl-unicode(_at_)perl(_dot_)org
Subject: Re: UTF-16 -> UTF-8


On Wed, 21 Nov 2001 00:22:04 -0000, in perl.unicode you wrote:

Thank you for your help.

Hope it was of some help :)

But you said you wanted to convert from UTF-8 to UTF-16. So you probably
want something like

    $palavra_objeito = utf8($_);
    $palavra_em_utf16 = $palavra_objeito->utf16;

We've tried just that and the result wasn't what we expected...

What was the input? What was the output you expected? What was the
output you observed?

Cheers,
Philip

Attachment: fich1.txt
Description: Text document

<Prev in Thread] Current Thread [Next in Thread>