On Wed, Jun 16, 1999 at 01:44:12PM -0700, Gurusamy Sarathy wrote:
If I understand you correctly, you are suggesting that Perl should
use utf8 (not bytes) as its internal representation for all data.
Absolutely not. I suggested that Perl *can* use utf8 (not bytes) as
its internal representation for *any* data. Then 'use utf8' will switch
OPs to ones which are able to distinguish whether a given SV is utf8
or byte-encoded (most OPs do not care).
So then, what happens when a utf8-encoded SV is passed to an OP
that doesn't want it?
Like what? AFAIK, OPs are divided into two categories:
a) Those affected by the current 'use utf8' (9? of them). They
will know how to handle SvIsUTF8 flag (a minor edit).
b) Not affected by the current 'use utf8'. They do not care now,
they will not care after this change.
Some changes to core-API functions may be needed. Say, sv_catsv()
will need to sv_2utf8() one of the arguments if the flags on the
arguments mismatch (or do something similar).
How does it "see" the real data? It either
has to convert it (in order to, say, print it to a file, call
some system API, etc.) or the utf8-encoded SV has to have a cached
copy of the "byte-encoded" data. I don't think you mean the latter.
Basically, the flag IsUTF8 set is a pretty good indication you will
have problems if you try to convert it to bytes. (There may be
exceptions, like doing a substr() which does not pick chars > 127.)
Here is the scoop:
a) printing: all I/O goes via conversion (possibly empty, if the
data is not SvIsUTF8, *and* no translations on a channel are
Skipping characters will not be a simple C<string++>; instead,
it will need to be done with C<string += UTF8SKIP(string)>.
??? *When* do you skip chars? The opcodes which need it do already
do it this way.
Not as far as eye can see. Unless you restrict what type of SVs
can be passed to what OPs, all OPs have to deal with utf8-encoded
SVs (just as all OPs have to deal with magic for magic to work
??? I repeat: give an example.