Hello, Dan!
1)
This my second portion of comments on the renewed Supported.pod.
This part is 100% orthogonal to the first part
2)
This patch
- changes status of KOI8-U on Jungshik's comment
(sorry, I have never tested that myself :-(
- upgrades GB2312 to the "first class citizen"
(why not?)
- adds a section on Microsoft naming acrobatics
- that patch includes a comment on the Shift_JIS
differences between JIS X 0208-1997 Appendix 1
and cp932
- ...
- this patch also makes clear that Encode supports
the standards for GB2312 and Big5 not Microsoft
extensions (have I grasped it right? :-)
--- ext/Encode/lib/Encode/Supported.pod.orig Mon Apr 1 03:42:52 2002
+++ ext/Encode/lib/Encode/Supported.pod Thu Apr 4 15:16:10 2002
@@ -308,8 +308,8 @@
=item *
-To (en|de) code Encodings marked as C<*>, You need C<Encode::HanExtra>
-,available from CPAN.
+To (en|de) code Encodings marked as C<(*)>, You need
+C<Encode::HanExtra>, available from CPAN.
=back
@@ -317,33 +317,43 @@
US-ASCII UTF-8 ISO-8859-* KOI8-R
Shift_JIS EUC-JP ISO-2022-JP ISO-2022-JP-1
- EUC-KR Big5
+ EUC-KR Big5 GB2312
-are registered to IANA as preferred MIME names and may probably be used over
the Internet.
+are registered to IANA as preferred MIME names and may probably
+be used over the Internet.
-C<Shift_JIS> is no longer Microsft proprietary since it has been
-officialized by JIS X 0208-1997.
+C<Shift_JIS> has been officialized by JIS X 0208-1997.
+L<Microsoft-related naming mess> gives details.
+
+C<GB2312> is the IANA name for C<EUC-CN>.
+See L<Microsoft-related naming mess> for details.
+
+C<GB_2312-80> I<raw> encoding is available as C<gb2312-raw>
+with Encode. See L<Encode::CN -- Continental China> for details.
EUC-CN
+ KOI8-U (http://www.faqs.org/rfcs/rfc2319.html)
-has not been registered with IANA (as of march 2002) but
-seems to be supported by major web browsers. In Encode, GB2312
-is aliased to EUC-CN, with "uncooked" version of GB2312 canonicalized
-as gb2312-raw. See L<Encode::CN> for details.
+have not been registered with IANA (as of March 2002) but
+seem to be supported by major web browsers.
+IANA name for C<EUC-CN> is C<GB2312>.
KS_C_5601-1987
-has been registered to IANA but when they are used, they are
-EUC-coded. Internet community in Korea is not happy with this.
-so C<KS_C_5601-1987> is aliased to C<cp949>, an enhanced version
-of C<euc-kr>, with ksc5601-raw for "uncooked".
+is heavily misused.
+See L<Microsoft-related naming mess> for details.
+
+C<KS_C_5601-1987> I<raw> encoding is available as C<kcs5601-raw>
+with Encode. See L<Encode::KR -- Korea> for details.
UTF-16
- KOI8-U (http://www.faqs.org/rfcs/rfc2319.html)
-are IANA-registered (C<UTF-16> even as a preferred MIME name)
+=for comment
+waiting for comments from Jungshik Shin to soften this - Anton
+
+is a IANA-registered preferred MIME name
but probably should be avoided as encoding for web pages due to
-the lack of browser supports.
+the lack of browser support.
ISO-IR-165 (http://www.faqs.org/rfcs/rfc1345.html)
GBK
@@ -360,6 +370,73 @@
BIG5PLUS (*)
is a bit proprietary name.
+
+=head2 Microsoft-related naming mess
+
+Microsoft products misuse the following names:
+
+=over 2
+
+=item KS_C_5601-1987
+
+Microsoft extension to C<EUC-KR>.
+
+Proper name: C<CP949>.
+
+See
+http://lists.w3.org/Archives/Public/ietf-charsets/2001AprJun/0033.html
+for details.
+
+Encode aliases C<KS_C_5601-1987> to C<cp949> to reflect
+this common misusage.
+I<Raw> C<KS_C_5601-1987> encoding is available as C<kcs5601-raw>.
+
+See L<Encode::KR -- Korea> for details.
+
+=item GB2312
+
+Microsoft extension to C<EUC-CN>.
+
+Proper names: C<CP936>, C<GBK>.
+
+C<GB2312> has been registered in the C<EUC-CN> meaning at
+IANA. This has partially repaired the situation: Microsoft's
+C<GB2312> has become a superset of the official C<GB2312>.
+
+Encode aliases C<GB2312> to C<euc-cn> in full agreement with
+IANA registration. C<cp936> is supported separately.
+I<Raw> C<GB_2312-80> encoding is available as C<kcs5601-raw>.
+
+See L<Encode::CN -- Continental China> for details.
+
+=item Big5
+
+Microsoft extension to C<Big5>.
+
+Proper name: C<CP950>.
+
+Encode separately supports C<Big5> and C<cp950>.
+
+=item Shift_JIS
+
+Microsoft's understanding of C<Shift_JIS>.
+
+JIS has not endorsed the full Microsoft standard however.
+The official C<Shift_JIS> includes only JIS X 0201 and JIS X 0208
+subsets, while Microsoft has always been meaning C<Shift_JIS> to
+encode a wider character repertoire.
+
+As a historical predecessor Microsoft's variant
+probably has more rights for the name, albeit it may be objected
+that Microsoft shouldn't have used JIS as part of the name
+in the first place.
+
+Unabiguous name: C<CP932>.
+
+Encode separately supports C<Shift_JIS> and C<cp932>.
+
+=back
+
=head1 Bookmarks
What do you think of it, Dan? :-)
3)
Jungshik, I would have certainly advocated linking not only to
http://lists.w3.org/Archives/Public/ietf-charsets/2001AprJun/0033.html
but also to your comments on the KS_C_5601-1987 in the list archive,
but all your mails were on several subjects each.
Jungshik> ... refer to Ken Lunde's CJKV Information Processing
Jungshik> about that 'epic war' between two camps. (see p.197 of
Jungshik> the book and http://jshin.net/faq/qa8.html)
Jungshik> We even set up a web page to prevent M$ from spreading that
Jungshik> ill-defined name.
maybe we may link to this page? What is the address?
4)
Certainly the
[ID 20020312.006] pod2html does not translate space to '_' in L<>-s
bug still spoils our links. I have sent a new mail on that to
perl5-porters..
Furthermore, I don't understand why C<gb2312-raw> converts
to <CODE>gb2312-raw> while C<GB2312> becomes a link?
Anyway I have gone for putting C<> around, but if that feature/bug
persists maybe it's better to drop the C<> in my patch.
- Anton