perl-unicode

Re: UTF-8 case conversion

2003-09-03 06:30:08
On Wed, 3 Sep 2003, Jarkko Hietaniemi wrote:

use Encode 'from_to';

my $orjan = 'ÖRJAN';
my $lundstrom = 'LUNDSTRÖM';

print $orjan . ' ' . $lundstrom . "\n";

from_to $orjan,'latin1','utf-8';
from_to  $lundstrom,'latin1','utf-8';

It is my understanding that from_to is the wrong thing to use here. The

Your understanding is correct.

It was me that didn't understand ;)

- you obtain some character data, for example by putting it literally in
  your script. If the script itself is in utf-8, it should contain
  "use utf8;". If not (like your script), perl will assume ISO-8859-1.

Or "use encoding 'whatever';", and Perl actually assumes whatever is
your native encoding, be it ISO 8859-1, or -2, or CP1252, or EBCDIC,
or whatever.

  A different source of data would be reading from a file, which is
  opened with the correct encoding specified (see Andreas' reply).

  A third source would be by reading a file or a socket and obtainng raw
  bytes which can be interpreted as characters using decode().

In this case, e.g.:

$lundstrom = decode("latin-1", $lundstrom);

This starts to look like the application where I will use this stuff. I
use the university ldap server for authentication and to get some
elementary info about authors of dissertations. The LDAP server returns
the stuff in uppercase utf-8. I wan't to store them in a bibliographic
database, in a more typographically appealing. I get my data from
the Net::LDAP module. The strings doesn't seem to be decoded...

but then ...

  from_to($data, ïso-8859-1", ütf8"); #1
  $data = decode(ïso-8859-1", $data);  #2

I added

binmode STDOUT, ":utf8";

at the top, and

$data_in_my_script = decode("utf8", $data_from_LDAP);

and by that I'm a much happier man than an hour ago!

Thanks again

Sigfrid


<Prev in Thread] Current Thread [Next in Thread>