Re: utf8_heavy noise

On Sun, Jun 22, 2003 at 05:28:03PM -0400, Daniel Yacob wrote:

For your information:
Unicode 4.0 adds two sets of decimal digits.  :-)

1946..194F    ; Nd #  [10] LIMBU DIGIT ZERO..LIMBU DIGIT NINE
104A0..104A9  ; Nd #  [10] OSMANYA DIGIT ZERO..OSMANYA DIGIT NINE


Thanks!  I wasn't aware of these additions.  I gave them a try
but it appears Perl 5.8.0 was treating Unicode 4.0 chars as invalid.


In what way invalid?

My GNOME terminal seemed to be converting Osmanya into something else
also.


Unicode 4.0 came out this spring (about 9 months after Perl 5.8.0), so
I wouldn't be surprised if much software (or data, like fonts) isn't
yet updated for it.

I'd like to bring up another utf8 issue.  My scripts that work with
utf8 text always seem to start with:

use utf8;
if ( $] >= 5.007 ) {
      binmode (STDOUT, ":utf8");
}


It would be nice if "use utf8" set IO modes for utf8 automagically.
Perhaps a pragma could be passed such as:  use utf8 ':all'  (or something),
that set everything to utf8 that is settable.


And fixing that in Perl 5.8.1 would help Perl 5.8.0 how? :-)

But more seriously, the "use utf8" is "an evolutionary dead end".
The only thing it means these days is "my script is in UTF-8".
For "all the other" things, I think there can't ever be a consensus
for "all those things", since there are so many of such things.
Better be very explicit about the things you want to "UTF-8-ize".

cheers,

/Daniel


-- 
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> http://www.iki.fi/jhi/ "There is this 
special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen