perl-unicode

Re: Encoding vs Charset

2002-03-26 19:34:44
On Tue, 26 Mar 2002, Jungshik Shin wrote:

really means euc-cn and charset="ks_c_5601-1987" really menas euc-kr.  
Sadly this misconception is enbedded to popular browsers.

M$ OE, M$ Frontpage keep producing html docs. However,
it also has to be noted that the encoding
designated as  'ks_c_5601-1987'  by M$ is NOT the same as 
EUC-KR BUT their proprieatary extension of EUC-KR, namely
CP949/UHC/(X-)-Windows-949.  

  Therefore, I'd like to suggest (or rather do) for Korean encodings that:

  - Add X-Windows-949 converter 
  - Make 'ks_c_5601-1987' and 'X-UHC', 'UHC',
    and 'CP949' as an alias to 'X-Windows-949'
  - Add JOHAB converter 
  - Remove 'ksc5601' aliased to 'euc-kr'.

Since there are some existing data in X-Windows-949 but mislabeled
as EUC-KR, it might be necessary to make 'euc-kr' -> Unicode
converter  generous and act as 'X-Windows-949' to Unicode
converter (whether or not this is desirable and necessary depends
on what applications Encode may be used for). 
However, in the other direction (Unicode -> euc-kr)
it has to be strictly compliant to the standard. 
See <http://bugzilla.mozilla.org/show_bug.cgi?id=131388>

  Jungshik Shin


<Prev in Thread] Current Thread [Next in Thread>