Octavian Rasnita <orasnita(_at_)fcc(_dot_)ro> writes:
Oh, sorry, but I've made a mistake when writing the message.
The Romanian language uses ISO-8859-2 and not ISO-8859-1
So the question remains. Is it possible to decode a text written in more
languages that use more charsets?
Yes. But perhaps not as easily as you would like.
You need markers which show where the encodings change.
For perl purposes the language is not important, it is the
"charset" (encoding) that matters. The encoding determines what
the 8-bit bytes (also called octets) in a file mean as characters.
So one "file" can normally only be in one encoding - this includes
the perl script. Unicode and UTF-8 are designed to avoid this problem
because UTF-8 can represent any Unicode code point and there
are Unicode code points for (almost) all characters used by any
language.
However older 8-bit encodings like iso-8859-1 and iso-8859-2 pick
different 256 character subsets. If I recall correctly
So you cannot just enter 8-bit string litterals in both encodings
into one perl script, and have perl know what they are directly.
But you can have
my $spanish = "...";
my $romanian = "...";
# Note that only one of those can "look right" in an iso-8859-* editor
my $combined = Encode::decode('iso8859-1',$spanish).
Encode::decode('iso8859-2',$romanian);
You can then "print" the combined string as UTF-8 (or other Unicode
encoding). But you will then need some way of viewing the Unicode
file. An editor which can view the UTF-8 file will probably also
allow you to enter UTF-8 strings directly as well. So you could
write you script in UTF-8 and avoid the problem.
Note that you cannot (in general) "print" the combined string as
either 8859-1 or 8859-2
Thank you.
----- Original Message -----
From: "Nick Ing-Simmons" <nick(_dot_)ing-simmons(_at_)elixent(_dot_)com>
To: <orasnita(_at_)fcc(_dot_)ro>
Sent: Tuesday, April 13, 2004 11:13 AM
Subject: Re: Decoding more languages
Octavian Rasnita <orasnita(_at_)fcc(_dot_)ro> writes:
Hello all,
I want to transform a text that contains words in more languages (it is a
course for learning a foreign language) in UTF-8.
I have 2 texts, one that contains Romanian and French words, and another
one
that contains Romanian and Spanish words.
I have seen that I can Encode::decode('ISO-8859-1', $text) the romanian