On Wed, 3 Sep 2003, Jarkko Hietaniemi wrote:
use Encode 'from_to';
my $orjan = 'ÖRJAN';
my $lundstrom = 'LUNDSTRÖM';
print $orjan . ' ' . $lundstrom . "\n";
from_to $orjan,'latin1','utf-8';
from_to $lundstrom,'latin1','utf-8';
It is my understanding that from_to is the wrong thing to use here. The
Your understanding is correct.
It was me that didn't understand ;)
- you obtain some character data, for example by putting it literally in
your script. If the script itself is in utf-8, it should contain
"use utf8;". If not (like your script), perl will assume ISO-8859-1.
Or "use encoding 'whatever';", and Perl actually assumes whatever is
your native encoding, be it ISO 8859-1, or -2, or CP1252, or EBCDIC,
or whatever.
A different source of data would be reading from a file, which is
opened with the correct encoding specified (see Andreas' reply).
A third source would be by reading a file or a socket and obtainng raw
bytes which can be interpreted as characters using decode().
In this case, e.g.:
$lundstrom = decode("latin-1", $lundstrom);
This starts to look like the application where I will use this stuff. I
use the university ldap server for authentication and to get some
elementary info about authors of dissertations. The LDAP server returns
the stuff in uppercase utf-8. I wan't to store them in a bibliographic
database, in a more typographically appealing. I get my data from
the Net::LDAP module. The strings doesn't seem to be decoded...
but then ...
from_to($data, ïso-8859-1", ütf8"); #1
$data = decode(ïso-8859-1", $data); #2
I added
binmode STDOUT, ":utf8";
at the top, and
$data_in_my_script = decode("utf8", $data_from_LDAP);
and by that I'm a much happier man than an hour ago!
Thanks again
Sigfrid