perl-unicode

Re: Explaining this behavior (was Re: good name for characters matching [^\0-\377]?)

2007-10-22 09:59:10
E R skribis 2007-10-22 11:47 (-0500):
I think I'm trying to make a slightly different point: part of what
Encode::encode MUST do is to create a Perl string with a particular
internal representation. For example, in:
  $a = Encode::encode(...);
  chop($b = $a."\x{101}");
we have $a eq $b, but $r->print($b) will probably not give you the
output you want.
I find the implications of this interesting. In particular, the
conventional internal representation (the one Perl uses when the
string has never seen any character ordinals > 255) cannot be left out
of any presentation of Perl strings since it is required for
communication with modules such as mod_perl, etc. The utf8
representation, on the other hand, can be left out as programmers
should not care how Perl internally represents the string when there
are characters matching [^\0-\377].

This is exactly true. I was merely pointing out the same thing from a
different perspective.

Note, by the way, that even strings that have never contained a chr
0..255 may be utf8 internally. This should never happen with binary
operations, but it may happen with text operations.
-- 
Met vriendelijke groet,  Kind regards,  Korajn salutojn,

  Juerd Waalboer:  Perl hacker  <#####(_at_)juerd(_dot_)nl>  
<http://juerd.nl/sig>
  Convolution:     ICT solutions and consultancy 
<sales(_at_)convolution(_dot_)nl>