perl-unicode

Re: Encode: CJK-Guide

2002-03-26 21:35:13
On Tue, Mar 26, 2002 at 07:16:01PM -0500, Jungshik Shin wrote:
  BTW, I don't find any reference to Microsoft code pages
(CP949 for Korean, CP950, CP 936 , and CP932), JOHAB(Korean), and 
Big5-HKSCS Is that because they're not yet supported (well, Shift-JIS 
and Big5 are supported)? 

CP949 is there in Encode::KR. CP950 is in Encode::TW. CP936 is in
Encode::CN. CP932 is in Encode::JP.

I've put Big5-HKSCS into Encode::TW, which is later renamed to
big5-hk.ucm by Dan. I don't think it's a good idea, though...
Dan, could you explain the reason?

As a result, something funny has happed.  For example, U+673A means "a
machine" in Simplified Chinese but "a desk" in Japanese.  "a machine"
in Japanese.  U+6A5F.  

  Do you really believe this is a strong case against Han Unification?
I don't see any problem with this.  There are a number of
Chinese characters with multiple meanings  even without Han
Unification. Do those 'meanings' have to be assigned separate
code points? 

Dan probably thinks that U+673A in Simplified Chinese Script and Japanese/
Traditional Chinese Script should be assigned two different code points.

Unicode does have a distinction between "Modifier Letter Prime" and "Prime",
which is by their usage (letter/symbol) despite they share the same appearance.

So you can't tell what it means just by looking at the code.
  Why does coded character set have to care about what computational
linguists have to do? You can't tell the meaning of 
any English word with multiple meanings by just looking at
its computer representation without context/grammatical/linguistic/lexical
analysis, can you? How do you know what 'fly' means without context? 

How about "So you can't tell which Script it means just by looking at the code"?

/Autrijus/

Attachment: pgpvP47d0XVNi.pgp
Description: PGP signature

<Prev in Thread] Current Thread [Next in Thread>