perl-unicode

Re: Unicode aware module

1999-06-17 14:38:24
Nick Ing-Simmons writes:
: Ilya Zakharevich <ilya(_at_)math(_dot_)ohio-state(_dot_)edu> writes:
: 
: >> 
: >> Tk would care. It needs to know encoding to map strings to fonts/indices.
: >> However bitmaps are binary data.
: >
: >I think you realize that this can only support my position: Tk should
: >be able to determine what the given SV contains, *and behave
: >accordingly*.
: 
: That was exactly what I was trying to do - support your position.
: Sorry if fact that I replied to _your_ message made that less obvious.

I'd say that if Tk has blithely changed its interface to take utf-8
instead of latin-1 then it has *broken* its contract with the user.

All this discussion misses the point that we're dealing with a bunch of
existing interfaces.  The old interfaces *specify* narrow characters.
There is no way to wave a magic wand over these interfaces and expect
the modules behind those interfaces to suddenly start behaving both
differently and coherently.

Ilya keeps saying things like the current design has no point of view,
but this is not terribly fair.  The current design has the point of
view that the tail can't wag the dog.  If you want a different
interface to a module, you're going to have to *specify* a different
interface to the module, and get the cooperation of the module.
Trying to force polymorphic data down a hole meant for octets is a bit
like suddenly telling your wife you've decided she has to share the
house with a second wife.  She may or may not put up with it.  Setting
a global flag that changes all interfaces to be polymorphic is even
worse.  That's like trying to pass a law that everyone in town must
marry a second wife.

I'm not against schemes for autogenerating utf-8 aware modules from
non-utf-8 aware modules where that's practical.  But we must be aware
that it's a different module with a different interface, and that the
correctness of automatic translation is about as decidable as the
halting problem.

: >Note that the "according" behaviour when obtaining a bitmap given by a
: >string with chars > 255 is failure.
: >
: >> IO is going to need to do encoding conversion in _some_ cases if all this 
: >> is to be any use. If I am displaying multi-lingual documents from 
: >> different files I need to convert iso-8859-X on input to (say) UTF8.
: >> The inverse on output. But JPEG images etc. are not converted!
: >
: >Moreover, all that not marking "wide" strings as such does is it moves
: >the responsibility of bookkeeping to the user side.
: 
: Which is _only_ place that knows. Consider a JPEG image embeded in 
: say a XML page of text in an Indian script. Mostly UTF8 but between 
: these tags it is binary.

Yes, and this sort of thing is pervasive.  Every interface is a
contract between the user and the usee, and you can't just up and
change the contract without the agreement of both parties.  Well, you
can, but in that case you'd better be willing to declare war on the
other party.  And if your program declares war on byte-oriented
modules, there will certainly be casualties.  It's much better
to take a cooperative approach.

: >Having Perl do bookkeeping *may* have some impact on speed, but my
: >guesstimates are that it will be circa 1% or below.  Most OPs do not
: >care whether their arguments are narrow or wide, and for those which
: >care it is only a check of a bit of SvFLAGS to jump to a proper section.
: 
: Even if it is a little slower it may be _necessary_ to pay the price.
: There is no point in a 100mph car if it cannot turn corners...
: well not for general use anyway ;-)

It bugs me that we're trying to reinvent the type system here.  I think
if we want to tag strings dynamically with types, we should use bless.
If an interface wants polymorphic strings, it should specify that it
wants to deal with string objects.  Other than that, code should know
at compile time what type of string it's dealing with so that it can do
so efficiently.  And that's precisely what lexical scoping provides.  I
do not believe the current implementation is nonsensical, frequent
assertions to the contrary notwithstanding.  Nor do I believe it is
perfect, though if anyone wants to make that assertion I'm all ears.  :-)

Larry

<Prev in Thread] Current Thread [Next in Thread>