perl-unicode

Re: let's cook it!

2002-03-27 09:04:14
On Wed, 27 Mar 2002, Nick Ing-Simmons wrote:

Autrijus Tang <autrijus(_at_)autrijus(_dot_)org> writes:
On Tue, Mar 26, 2002 at 06:28:07PM -0500, Jungshik Shin wrote:
  Microsoft products use 'ks_c_5601-1987' as an encoding name/MIME
charset/character set encoding scheme. That's a very strange use
of KS C 5601-1987. Because, what they mean by 'ks_c_5601-1987'
is actually CP949/Unified Hangul Code(UHC)/X-Windows-949,
an upward compatible proprieatary extension of EUC-KR.

Just a quite note: exactly the same thing has happened with Microsoft's
use of 'gb2312' to mean 'gbk', and 'big5' to mean 'cp950'. In Encode.pm,
I've been carefully avoiding this misbehaviour; it has been fortunate that
'ks_c_5601_1987' has a distinct name from 'ksc5601'. :-)

At least they are consistently wrong across the world, most MS things
claiming to be iso-8859-1 are really cp1252

  Well, not really. MS registered Windows-125x with IANA and use
Windows-125x in their products consistenly. It's NOT MS products (MS OE, IE,
Frontpage) BUT broken programs like Eudora (with very little notion of
I18N and MIME charset) that run under MS Windows that label Windows-125x
documents as ISO-8859-x. I don't like MS, but they shouldn't be blamed
for what's not their fault.

  MS should have registered CP949/950 as Windows-949/950
instead of labeling them misleadingly as ks_c_5601-1987 and big5, In case
of gb2312, gbk should be registered and used. I don't know about big5,
but in Korean case, apparently they tried to pretend that they follow
Korean Nat'l std. while they extended it in a proprietary way.

  Jungshik Shin

<Prev in Thread] Current Thread [Next in Thread>