On Mon, 14 Jun 1999 08:56:18 BST, Nick Ing-Simmons wrote:
Gurusamy Sarathy <gsar(_at_)activestate(_dot_)com> writes:
I'm beginning to see where you are not "getting" it. You are
arguing from the POV of some mythical implementation that doesn't
Yes he is - in fact suggesting that such an implementation may be superior.
At this stage of unicode support - it makes sense to at least
consider such alternatives.
I'm always willing to consider alternatives.
One way of looking at this might be to have an SvCHAROK flag akin to SvPOK
but _without_ a corresponding change in internal representation.
(Although such alternate representation may be useful for utf16 or to
provide a slot for the encoding.)
Data starts off neutral. When an op treats the string as 'chars'
(due to expicit utf8/locale in scope or whatever) the bit gets set.
Then later when data is processed by an op that can jump
either way it can use the flag to decide. A bit like SvIOK vs SvPOK behaves
with & etc.
Might be workable, as long as this will not force extra tests in
places that are unrelated to processing characters.
But I don't see how straight I/O, for instance, can manage to
stay fast and without gratuitous conversions to/from the utf8
representation. (Most code that reads files uses chop().)
In fact, my intuition says converting the data *once* to/from utf8
will provide better performance than perpetually testing to see if
the data is represented as bytes or in utf8 inside every operation
that needs to deal with the data.