On Thu, 4 Apr 2002, Anton Tagunov wrote:
Hi Anton,
Thanks a lot.
- changes status of KOI8-U on Jungshik's comment
(sorry, I have never tested that myself :-(
I haven't test it either :-), but both Mozilla/Netscape6 and MS IE
list it in view|encoding menu, which I interpret as having support
for it.
UTF-16
- KOI8-U (http://www.faqs.org/rfcs/rfc2319.html)
-are IANA-registered (C<UTF-16> even as a preferred MIME name)
+=for comment
+waiting for comments from Jungshik Shin to soften this - Anton
+
+is a IANA-registered preferred MIME name
but probably should be avoided as encoding for web pages due to
-the lack of browser supports.
+the lack of browser support.
The reason your test didn't work with MS IE was probably
you didn't prepend your UTF-16 html doc. with BOM(byte order mark).
It's to be noted that a conventional way of informing web browsers
of MIME charset by putting <meta> tag doesn't work for UTF-16/UTF-32.
Either you have to configure your web server to emit C-T header with
'charset=UTF-16(LE|BE)' or you have to put BOM at the beginning.
When BOM is present, MS IE 5/6, Mozilla/Netscape6 and Netscape4
have no problem rendering UTF-16(LE|BE) encoded pages. I put
up a couple of test pages at
http://jshin.net/i18n/utf16le_kr2.html
http://jshin.net/i18n/utf16be_kr2.html
For more details on UTF-16 and HTML, you can refer to HTML4 spec. at
http://www.w3.org/TR/html4/charset (see section 5.2.1)
As I wrote before, I have no intention to encourage use of UTF-16 over
UTF-8 although some people whose primary script has a more 'economical'
(in terms of file size) representation in UTF-16 than in UTF-8 may want
to use it.
+=head2 Microsoft-related naming mess
+
+Microsoft products misuse the following names:
+
+=over 2
+
+=item KS_C_5601-1987
+
+Microsoft extension to C<EUC-KR>.
+
+Proper name: C<CP949>.
+
+See
+http://lists.w3.org/Archives/Public/ietf-charsets/2001AprJun/0033.html
+for details.
Wow, I didn't know that Martin wrote this. Thanks a lot for
digging this up. He 'rediscovered' what a lot of people in Korea had
complained about. One thing I don't agree with him is what designation
to use for CP949. I think it'd better be 'windows-949' because that's
more in line with other MS code pages such as windows-125x (for European
scripts). By the same token, MS version of Shift_JIS can be labeled as
'windows-932. At the moment, Mozilla uses 'x-windows-949' for CP949/UHC
because it's not yet registered with IANA. Probably, I have to contact
Martin and discuss this issue.
+Encode aliases C<KS_C_5601-1987> to C<cp949> to reflect
+this common misusage.
If my patch is accepted, cp949 has a couple of more aliases,
'uhc' and '(x-)-windows-949'. CP949 is commonly known as
'통합 완성형'(Unified Hangul Code) in Korea.
+I<Raw> C<KS_C_5601-1987> encoding is available as C<kcs5601-raw>.
ksc5601-raw had better be renamed ksx1001-raw and ksc5601-raw
can be made an alias to ksx1001-raw. Pls, note that now what's now called
ksc5601-raw has two new characters which were only added in Dec. 1998
over a year after the name change (KS C 5601 -> KS X 1001).
+=item GB2312
+
+Encode aliases C<GB2312> to C<euc-cn> in full agreement with
+IANA registration. C<cp936> is supported separately.
+I<Raw> C<GB_2312-80> encoding is available as C<kcs5601-raw>.
Oops... You meant gb2312-raw, didn't you? :-)
Jungshik, I would have certainly advocated linking not only to
http://lists.w3.org/Archives/Public/ietf-charsets/2001AprJun/0033.html
but also to your comments on the KS_C_5601-1987 in the list archive,
but all your mails were on several subjects each.
Jungshik> ... refer to Ken Lunde's CJKV Information Processing
Jungshik> about that 'epic war' between two camps. (see p.197 of
Jungshik> the book and http://jshin.net/faq/qa8.html)
Jungshik> We even set up a web page to prevent M$ from spreading that
Jungshik> ill-defined name.
maybe we may link to this page? What is the address?
The campaign web has disappeared since. It was almost 5 years
ago :-). However, my Hangul FAQ subject 8 deals with the issue
(http://jshin.net/faq/qa8.html) so that you may add the link to it.
Well, be aware that it's been untouched for a few years (if not longer)
and needs a complete overhaul.