perl-unicode

Re: UTF-16 -> UTF-8

2001-11-20 12:34:21
On Tue, 20 Nov 2001 16:35:25 -0000, in perl.unicode you wrote:

open(FICH1,"fich1.txt")||die"Nao foi possivel abrir o ficheiro fich1.txt";
open(FICH3,">fich3.txt")||die"Nao foi possivel abrir o ficheiro fich3.txt";

Good that you check for success, but you should also include the reason
-- it's in $!. For example:

    open(FICH1, "fich1.txt") || die "Nao foi possivel abrir " .
                                    "o ficheiro fich1.txt: $!";

use utf8;

You shouldn't need that. Unicode::String will do all the Unicodery for
you; your program only needs to handly 'plain' bytes.

while (<FICH1>) {
      chomp($_);
      $palavra1=$_;
      @array=split(/ /,$palavra1);

What do you use $palavra1 and @array for? (And @array is usually a bad
variable name.)

      $palavra2=utf16($_);

Here is a mistake. If you call utf16($_), it means "$_ is a string
encoded in UTF-16. Take it and convert it into a Unicode::String
object."

But you said you wanted to convert from UTF-8 to UTF-16. So you probably
want something like

    $palavra_objeito = utf8($_);
    $palavra_em_utf16 = $palavra_objeito->utf16;

Note that ->utf16 will return UTF-16BE, as I understand it, since
"Internally a Unicode::String object is a string of 2 byte values in
network byte order (big-endian)" (quote from the docs). So if your
database and/or file wants UTF-16LE (which is more natural for Intel
chips), then you need to do something such as

    $palavra_objeito->byteswap;

first (after you assign to $palavra_objeito and before you call ->utf16)
to convert from big-endian to little-endian.

      $sql =  "INSERT INTO Tipo_Referencia ( Descricao ) SELECT '$palavra2' 
AS Expr1;";

Is there a reason why you don't write this as

    $sql = "INSERT INTO Tipo_Referencia ( Descricao ) " .
           "VALUES ('$palavra_em_utf16')"

? The "INSERT INTO table (columns) VALUES (literals)" is, for me, the
usual syntax, and "INSERT INTO table (columns) SELECT literals AS dummy"
looks strange to me.

      print FICH3 $palavra2,"\n";
      $conn->execute($sql,,,adExecuteNoRecords);

This is the same as

    $conn->execute($sql,adExecuteNoRecords);

.. If the constant adExecuteNoRecords has to be the fourth parameter to
->execute, then say so:

    $conn->execute($sql, undef, undef, adExecuteNoRecords);

.. Perl isn't Visual Basic :)

To summarise, I think you have misunderstood how Unicode::String works.
utf16() (called as a function, not a method) doesn't convert a strong
*to* UTF-16, it expects a string in UTF-16 and converts *from* that
encoding into the internal format used by Unicode::String and returns an
object. Then you can call methods on that object to produce another
encoding such as UTF-8 or Latin-1 or whatever. So conversions involving
Unicode::String generally involve at least two calls.

Cheers,
Philip

<Prev in Thread] Current Thread [Next in Thread>