On Tue, 26 Mar 2002, Jungshik Shin wrote:
really means euc-cn and charset="ks_c_5601-1987" really menas euc-kr.
Sadly this misconception is enbedded to popular browsers.
M$ OE, M$ Frontpage keep producing html docs. However,
it also has to be noted that the encoding
designated as 'ks_c_5601-1987' by M$ is NOT the same as
EUC-KR BUT their proprieatary extension of EUC-KR, namely
CP949/UHC/(X-)-Windows-949.
Therefore, I'd like to suggest (or rather do) for Korean encodings that:
- Add X-Windows-949 converter
- Make 'ks_c_5601-1987' and 'X-UHC', 'UHC',
and 'CP949' as an alias to 'X-Windows-949'
- Add JOHAB converter
- Remove 'ksc5601' aliased to 'euc-kr'.
Since there are some existing data in X-Windows-949 but mislabeled
as EUC-KR, it might be necessary to make 'euc-kr' -> Unicode
converter generous and act as 'X-Windows-949' to Unicode
converter (whether or not this is desirable and necessary depends
on what applications Encode may be used for).
However, in the other direction (Unicode -> euc-kr)
it has to be strictly compliant to the standard.
See <http://bugzilla.mozilla.org/show_bug.cgi?id=131388>
Jungshik Shin