perl-unicode

Re: InLanguage properties? [Was Re: Encode-InCharset-0.01 Released]

2002-05-04 02:53:57
On Fri, May 03, 2002 at 05:52:37PM +0900, Dan Kogai wrote:
  To overcome this shortage Unicode does have character properties and 
you can get which I<script> it belongs to using that.  But unfortunately 
that was not the case for the origins of character repertoire (so I made 
one (Encode-InCharset) because I needed it).  Neither is the case for 
Languages.

This seems to be one of those ideographic/alphabetic splits. The
identity of alphabetic characters even across languages is more or less
clear; even without Unicode, I would percieve Latin-* as merely being
subsets of some larger character set. There's no reason why German users
use Latin-1 rather than Latin-2, or -3, or -4; it's just a matter of who
they trade with most. Since you can write many languages in several of
the ISO 8859 series, and write several languages in each 8859 charset,
language is something users of the Latin script never strongly
associated with charset. ISO-2022-like things just don't express this
well.  OTOH, from listening to Japanese users, I get the impression that
ISO-2022 fits their view of characters - GB2312 is totally seperate from
JIS X0218 and two characters in different charsets are inherently
different. Whence comes a lot of the Unicode flaming.

-- 
David Starner - starner(_at_)okstate(_dot_)edu
"It's not a habit; it's cool; I feel alive. 
If you don't have it you're on the other side." 
- K's Choice (probably referring to the Internet)

<Prev in Thread] Current Thread [Next in Thread>