perl-unicode

Re: BOM and principle of least surprise

2004-05-16 04:30:07
Jarkko Hietaniemi (jhi(_at_)iki(_dot_)fi) writes:
Both input data and the script. Just because the script has been saved
in UTF-8, does not mean that literals in the script are taken as UTF-8.

Oh, great.  Now you want to mix different encodings in the same file.
I give up :-)

I think you misunderstood me. This script was in my original post:

   use strict;
   
   use MSSQL::OlleDB;
   $| = 1;
   my $i = 0;
   foreach (1..2) {
      my $db = 'r\xE4ksm\xF6rg\xE5s'; 
      print "Len " . length($db) . " Str: $db\n";
      my $X = MSSQL::OlleDB->connect(undef, undef, undef, $db);
      $i++;
      print "$i\n" if $i % 50 == 0
   }
 
This script is supposed to connect to a database called "r\xE4ksm\xF6rg\xE5s", 
a name which in SQL Server is stored as Unicode, in UTF-16. OlleDB is
my XS module, and it uses SvUTF8 to determin whether $db is in UTF-8
or not, and then converts to UTF-16 from the ANSI code page or UTF-8.

First I had saved the script in ANSI format, and I connected as I had
expected. Then I saved the script in UTF-8. It still said "r\xE4ksm\xF6rg\xE5s"
when I looked at the file, but SvUTF8 still returned false, so I did
not connect to the database successfully.

To be able to that, it would have have to understand byte-order marks
(which it doesn't). I think there was a suggestion that you could
specify an 

In 5.8.5 it will.

Will such an option include the possibility to say that I want Perl to
determine the encoding from the byte-order mark?

-- 
Erland Sommarskog, Stockholm, sommar(_at_)algonet(_dot_)se