Re: Understanding Unicode support in Perl

On Fri, Jan 24, 2003 at 03:23:49PM -0500, Tay, William wrote:

Internally, Perl represent character strings in UTF8. The PerlIO layer for
input and output enables other encodings to be used for STDIN, STDOUT,
STDERR and filehandling operations. For instance, if ru_RU.KOI8-R is
specified (use open ':encoding(ru_RU.KOI8-R)';) as the encoding for data
coming from STDIN, it will be converted (by PerlIO ?) into UTF8 for internal
representation, and from UTF8 to ru_RU.KOI8-R for STDOUT.


That's correct.

Questions:
1. Before UTF8 is used as the internal character encoding (before 5.6 ?),
what default encoding is used to represent data internally?


They are simply stored as byte streams, akin to C.

2. What are the measures taken for backward compatibility?


Strings are divided into two classes: Unicode strings and byte strings.
In all circumstances, unless explicitly requested, all data default to the
second class.  You can "promote" strings to Unicode by either
concatenating it with a Unicode string, explicitly ask for it via PerlIO
layers, thru Encode::decode(), or manually utf8::upgrade it.

Since all those methods are not present in older perls, compatibility
is maintained by default. [1]

Hope this helps,
/Autrijus/

[1] There is an exception: in Perl 5.8.0, if your locale indicates that
    you can handle UTF-8, all IO filehandles are marked as ':utf8'.
    This controversial behaviour will probably go away by Perl 5.8.1,
    where it needs to use "perl -C" explicitly to get this behaviour.

pgpkxIH26IJQh.pgp
Description: PGP signature