Re: Unicode aware module

On Sun, 13 Jun 1999 05:23:25 EDT, Ilya Zakharevich wrote:

On Sun, Jun 13, 1999 at 01:46:41AM -0700, Gurusamy Sarathy wrote:

On Sun, 13 Jun 1999 04:21:03 EDT, Ilya Zakharevich wrote:

Please tell me whether

  sub foo { return $1 if s/\btypedef\s+int\s+(\w+)// }

is operating on bytes or characters.


Characters.


Wrong.  You cannot win.  ;-)


You already know I'm going to say the same to you. ;-)

Apparently the above sub processes a C file, so it should enforce "C"
locale, and in Perl-speak it means it is operating on bytes.


You're mixing up characters and bytes.  Only characters are affected
by locale.  So, if the code should "conform" to the C locale, it
is by definition operating on characters.  (That characters are defined
to be bytes in the "C" locale is irrelevant to this discussion.  They
might as well be huffman encoded, for all I care.)

            But that's a poor example of what I'm driving at.
chop($foo) would be a better example.  Would you want that to
remove one byte or one character?


I do not care as far as it chop()s.  If $foo contains bytes, let
it chop a byte.  If $foo contains chars, let it chop a char.


I put it to you that it is impossible to know whether the program
"wants" to treat it as bytes or characters, irrespective of what
the data is supposed to be.  If you don't buy this, I guess
there's no point arguing this point.

Uhh, that's what C<use byte> is.  If the code wants to play with
bytes, C<no utf8> makes little sense.  You'll have to exhaustively
deny all possible current and future character encodings via
C<no utf16>, C<no big5>, ad nauseam.  (We're talking about
hypothetical encodings yet to be supported, but you get the idea.)


I do not think we need more than two internal encodings: a quick
American, and a slow Universal.  All the others should be done as i/o
filters.


Try saying that to someone from China, or India.  :-)

I'm not convinced that you can guarantee the answer will be correct.
Consider a piece of code that must convert raw utf8 data to utf16.
Will it "do the right thing" when globalutf16 (or whatever) is
in effect?


I do not see a place for such a code in Perl.


These are arbitrary restrictions you speak of, and I (naturally :)
don't agree.


Sarathy
gsar(_at_)activestate(_dot_)com

<Prev in Thread]	Current Thread	[Next in Thread>
Re: Unicode aware module, (continued) Re: Unicode aware module, Gurusamy Sarathy Re: Unicode aware module, Ilya Zakharevich Re: Unicode aware module, Gurusamy Sarathy Re: Unicode aware module, Ilya Zakharevich Re: Unicode aware module, Gurusamy Sarathy Re: Unicode aware module, Ilya Zakharevich Re: Unicode aware module, Gurusamy Sarathy Re: Unicode aware module, Ilya Zakharevich Re: Unicode aware module, Nick Ing-Simmons Re: Unicode aware module, Ilya Zakharevich Re: Unicode aware module, Gurusamy Sarathy <= Re: Unicode aware module, Ilya Zakharevich Re: Unicode aware module, Tom Christiansen Re: Unicode aware module, Ilya Zakharevich Re: Unicode aware module, Tom Christiansen Re: Unicode aware module, Ilya Zakharevich Re: Unicode aware module, Gurusamy Sarathy Re: Unicode aware module, Ilya Zakharevich Re: Unicode aware module, Nick Ing-Simmons Re: Unicode aware module, Gurusamy Sarathy Re: Unicode aware module, Ilya Zakharevich