Re: Encode: CJK-Guide


Here's some feedback.

Republic of
Korea (South Korea; simply "Korea" as follows) has set KS C 5601 in
1989.  They are both based upon JIS C 6226, could be one of the


  KS C 5601 was first issued in 1987 and revised in 1989 and
1992. Then, it was renamed and reissued as KS X 1001:1998 in
1998.

Though there are escape-based encodings for these two (ISO-2022-CN
and ISO-2022-KR, respectively), they are hardly used in favor of EUC.


  ISO-2022-KR used to be widely used for Korean email exchange
as still is ISO-2022-JP. Now ISO-2022-KR is hardly used, but
at least it was used widely until late 1990's.  (see IETF RFC 1557).

When you say gb2312 and ksc5601, EUC-based encoding is assumed.


  Please, don't help spread this misuse. It might be all right
for the ignorant) public to say KS C 5601 in place of EUC-KR, but Perl 
programmers should learn the difference between KS C 5601/KS X 1001 (coded 
character set) and encoding/MIME charset/character set encoding scheme/
character coding. 

  As I wrote before, GB 2312 has been so widely (mis)used that there's
no way to replace it with EUC-CN. Korean situation is much better
although not as good as Japanese case.

  BTW, I don't find any reference to Microsoft code pages
(CP949 for Korean, CP950, CP 936 , and CP932), JOHAB(Korean), and 
Big5-HKSCS Is that because they're not yet supported (well, Shift-JIS 
and Big5 are supported)? 

 Another BTW, don't you think your description of Unicode
and Han Unification is a bit too negative and biased? 
I know you feel strongly about the subject, but I'm not
sure CJK-Guide is the best place to express your personal
opinion on it in. If you don't like to tone down or change
it, you may add a disclaimer like 'some people have
reservation about Han Unification and Unicode because ......'
or 'the following is my personal opinion shared by
some people but not universally accepted'.

As a result, something funny has happed.  For example, U+673A means "a
machine" in Simplified Chinese but "a desk" in Japanese.  "a machine"
in Japanese.  U+6A5F.


  Do you really believe this is a strong case against Han Unification?
I don't see any problem with this.  There are a number of
Chinese characters with multiple meanings  even without Han
Unification. Do those 'meanings' have to be assigned separate
code points?

So you can't tell what it means just by looking at the code.


  Why does coded character set have to care about what computational
linguists have to do? You can't tell the meaning of 
any English word with multiple meanings by just looking at
its computer representation without context/grammatical/linguistic/lexical
analysis, can you? How do you know what 'fly' means without context? 

  Jungshik Shin