perl-unicode

Re: Explaining this behavior (was Re: good name for characters matching [^\0-\377]?)

2007-10-22 09:48:06
On 10/22/07, Juerd Waalboer <juerd(_at_)convolution(_dot_)nl> wrote:

There's an alternative way of viewing this: there are two types of
strings: binary and text. If you encode text, you get binary.

I think I'm trying to make a slightly different point: part of what
Encode::encode MUST do is to create a Perl string with a particular
internal representation. For example, in:

  $a = Encode::encode(...);
  chop($b = $a."\x{101}");

we have $a eq $b, but $r->print($b) will probably not give you the
output you want.

I find the implications of this interesting. In particular, the
conventional internal representation (the one Perl uses when the
string has never seen any character ordinals > 255) cannot be left out
of any presentation of Perl strings since it is required for
communication with modules such as mod_perl, etc. The utf8
representation, on the other hand, can be left out as programmers
should not care how Perl internally represents the string when there
are characters matching [^\0-\377].