perl-unicode

Re: let's cook it!

2002-03-26 16:28:53

Dan,

I'm sorry for dropping in this late, but I've just joined
the list and found this. 

* rename gb2312 to gb2312-raw, ksc5601 to ksc5601-raw

  What do you mean by ksc5601-raw and gb2312-raw? If it's
KS C 5601-1987 and GB2312 put in GL, how about ksc5601-gl
and gb2312-gl?  Please, also note that KS C 5601-1992
was reissued and renamed as KS X 1001:1998. Therefore,
it'd be better to use ksx1001 in place of ksc5601 and
make ksc5601-* as aliases to ksx1001-*.

* and alias gb2312 and ksc5601 to euc-(cn|kr)

I agree. :)

 Oh, my gosh ! Please, remove this alias of ksc5601 to EUC-KR. That's the 
last thing we need. KS C 5601-1987 is NOT the encoding (or
character set encoding scheme or MIME charset) BUT just
a coded character set which is used in encodings/MIME charsets/
character set encoding schemes like EUC-KR and  ISO-2022-KR. 
By aliasing ksc5601 to EUC-KR, only thing we achieve
is to encourage the confusion and mistake which have to be
avoided at all cost.

Well, at least almost every other program (hc, iconv, mozilla...) does
that anyway.

  No, Mozilla doesn't do that. Neither does yudit. Mozilla's
character coding menu does NOT have KS C 5601.

  I wonder how this charset misteken as encode has started.  Well, in 
majority of encodings, charsets are applied uncooked so that may be the 
reason.

  Wait a moment. You have to be careful here. 'charset' is overloaded
term. In MIME sense, 'charset' means the same thing as 'encoding'
(e.g. ISO-2022-JP, ISO-2022-KR, US-ASCII, UTF-8, EUC-KR, EUC-JP,
EUC-CN, ISO-8859-X etc) and it DOES NOT mean the same thing as 
coded character set (JIS X 0208, JIS X 0201, KS X 1001, GB 2312,
CNS 1xxxx, US-ASCII, ISO-8859-x)

  It's unfortunate that GB2312 has been so firmly established in place
of EUC-CN. In case of EUC-KR, it has much stronger support than 
EUC-CN despite Microsoft's continuous assault on it and people
do know that EUC-KR is different from KS X 1001/KS C 5601.

  Microsoft products use 'ks_c_5601-1987' as an encoding name/MIME
charset/character set encoding scheme. That's a very strange use
of KS C 5601-1987. Because, what they mean by 'ks_c_5601-1987' 
is actually CP949/Unified Hangul Code(UHC)/X-Windows-949,
an upward compatible proprieatary extension of EUC-KR. No Korean
standard specifies it. However, apparently, they didn't want to 
give an impression that they came up with something proprietary
(not specified in Korean nat'l standard) by using 'X-Windows-949'
and decided to use 'ks_c_5601-1987' as MIME charset for it
although it has no place in Korean nat'l standard. Mozilla
has to accept 'ks_c_5601-1987' as an alias to 'X-Windows-949'
because MS IE, OE and frontpage are so widely used. 

  Jungshik Shin

<Prev in Thread] Current Thread [Next in Thread>