perl-unicode

Re: Unicode aware module

1999-06-13 02:03:33
On Sun, 13 Jun 1999 04:21:03 EDT, Ilya Zakharevich wrote:
Please tell me whether

  sub foo { return $1 if s/\btypedef\s+int\s+(\w+)// }

is operating on bytes or characters.

Characters.  But that's a poor example of what I'm driving at.
chop($foo) would be a better example.  Would you want that to
remove one byte or one character?  If you strictly want it to
remove just one byte, you better say C<use byte> because it
is going to operate with whatever notion of "character" may
be in effect.

                   It has nothing to do with the data that may be
given to it.  IOW, C<use byte> is unrelated to utf8--it denotes
a property of the code.

Then we need to determine another way to say that the subroutine
operates on a *sequence of integers 0..255 packed into a sequence of
bytes* (which is C<no utf8>, required if we have a globalutf8 pragma).

Uhh, that's what C<use byte> is.  If the code wants to play with
bytes, C<no utf8> makes little sense.  You'll have to exhaustively
deny all possible current and future character encodings via
C<no utf16>, C<no big5>, ad nauseam.  (We're talking about
hypothetical encodings yet to be supported, but you get the idea.)

I do not follow you: if we can detect a mismatch, why not do "a right
thing" instead of complaining?  (Of course, for performance issues
this should be switchable off.)

If you did the "right thing" automatically there would be no way
to tell if you got utf8 data when you were strictly just expecting
utf16 data.  So, no, you can't do the "right thing" automatically.

Why would you care if the answer is correct?

I'm not convinced that you can guarantee the answer will be correct.
Consider a piece of code that must convert raw utf8 data to utf16.
Will it "do the right thing" when globalutf16 (or whatever) is
in effect?


Sarathy
gsar(_at_)activestate(_dot_)com

<Prev in Thread] Current Thread [Next in Thread>