On Sun, Jun 13, 1999 at 12:50:02PM -0600, Tom Christiansen wrote:
I was discussing what Sarathy said, and that I do not understand his
usage of "byte". Since you did not say anything on this thread - yet,
I have no problem with *you* using this word - yet. ;-)
A byte has come to be a standard, eight-bit unit of memory storage,
sometimes referred to as octet. I don't understand how there can be
any question here.
We are discussing Perl here, not C. In Perl, there is no access to
memory storage, so at best a usage of such terms is misleading. Perl
works in much higher terms, like "number" or "string".
Now please reread what I wrote. Perl strings are not sequences of
bytes any more (as they were in 5.005). With the current
implementation of UTF-8 it is misleading, since wideness" is attached
to the code, not the data. My argument is that attaching wideness to
data instead of the code we can make wideness *transparent* without
sacrifying performance for the operations which do not *require*
wideness.
Then the absense of a global utf8 switch becomes an optimization only
- all the program work exactly the same (or better ;-) with the
addition of a global-utf8 switch.
[Here 'or better' means that they die in rarer situations: say,
without utf8 \x{FFF} is a fatal error.]
Ilya