Re: BOM and principle of least surprise

Jarkko Hietaniemi (jhi(_at_)iki(_dot_)fi) writes:

Both input data and the script. Just because the script has been saved
in UTF-8, does not mean that literals in the script are taken as UTF-8.


Oh, great.  Now you want to mix different encodings in the same file.
I give up :-)


I think you misunderstood me. This script was in my original post:

   use strict;
   
   use MSSQL::OlleDB;
   $| = 1;
   my $i = 0;
   foreach (1..2) {
      my $db = 'r\xE4ksm\xF6rg\xE5s'; 
      print "Len " . length($db) . " Str: $db\n";
      my $X = MSSQL::OlleDB->connect(undef, undef, undef, $db);
      $i++;
      print "$i\n" if $i % 50 == 0
   }
 
This script is supposed to connect to a database called "r\xE4ksm\xF6rg\xE5s", 
a name which in SQL Server is stored as Unicode, in UTF-16. OlleDB is
my XS module, and it uses SvUTF8 to determin whether $db is in UTF-8
or not, and then converts to UTF-16 from the ANSI code page or UTF-8.

First I had saved the script in ANSI format, and I connected as I had
expected. Then I saved the script in UTF-8. It still said "r\xE4ksm\xF6rg\xE5s"
when I looked at the file, but SvUTF8 still returned false, so I did
not connect to the database successfully.

To be able to that, it would have have to understand byte-order marks
(which it doesn't). I think there was a suggestion that you could
specify an


In 5.8.5 it will.


Will such an option include the possibility to say that I want Perl to
determine the encoding from the byte-order mark?

-- 
Erland Sommarskog, Stockholm, sommar(_at_)algonet(_dot_)se

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re: BOM and principle of least surprise, Jarkko Hietaniemi

Next by Date:

Re: BOM and principle of least surprise, Jarkko Hietaniemi

Previous by Thread:

Re: BOM and principle of least surprise, Jarkko Hietaniemi

Next by Thread:

Re: BOM and principle of least surprise, Jarkko Hietaniemi

Indexes:

[Date] [Thread] [Top] [All Lists]