perl-unicode

Re: [PATCH][Supported.pod] Encoding classification updata

2002-03-24 01:27:23
Ooops, here goes the patch :-)

--- ext/Encode/lib/Encode/Supported.pod.orig    Sat Mar 23 01:51:30 2002
+++ ext/Encode/lib/Encode/Supported.pod Sun Mar 24 10:12:25 2002
@@ -202,55 +202,87 @@
 
 =head1 Encoding Classification (by Anton Tagunov)
 
-Encodings
+This section tries to classify the supported encodings by their 
+applicability for information exchange over the Internet and to 
+choose the most suitable aliases to name them in the context of 
+such communication.
+
+Encoding names
+
+  US-ASCII    UTF-8       
+  ISO-8859-*  KOI8-R
+  Shift_JIS   EUC-JP  ISO-2022-JP
+  EUC-KR 
+  Big5
+
+are L<http://www.iana.org/assignments/character-sets>-registered as
+preferred MIME names and may probably be used over the Internet.
+C<Shift_JIS> is no longer Microsft proprietary since it has been
+officialized by JIS X 0208-1997. It is probably the most wide
+spread encoding for Japanese on the Internet.
+
+  EUC-CN
+
+has not been registered with IANA (as of march 2002) but
+seems to be supported by major web browsers. (IANA has registered
+this encoding as C<GB2312>, but C<gb2312> currently has a different
+meaning to the C<Encode> module. It will probably become alias to 
+C<EUC-CN> in the future; until then it is safer to avoid using 
+C<gb2312> as encoding name within Perl). 
+
+  UTF-16 
+  KOI8-U        (http://www.faqs.org/rfcs/rfc2319.html)
+
+are IANA-registered (C<UTF-16> even as a preferred MIME name)
+but probably should be avoided as encoding for web pages due to 
+lack of browser support.
 
-  US-ASCII    UTF-8       KOI8-R      ISO-8859-*
-  ISO-2022-CN ISO-2022-JP Big5
-  EUC-CN      EUC-JP      EUC-KR
-
-are <http://www.iana.org/assignments/character-sets>-registered as
-preferred MIME names and may probably be used  over the Internet.  So is
-
-  Shift_JIS
-
-but despite its wide spread it bears the label of being
-Microsft proprietary -- was.  Now Shift JIS is official as of
-JIS X 0208-1997.
-
-         UTF-16 KOI8-U
-
-are IANA-registered preferred MIME names but probably
-shoule be avoided as encoding for web pages due to lack of
-browser support.
-
-  ISO-2022      (http://www.ecma.ch/ecma1/STAND/ECMA-035.HTM)
-  ISO-2022-JP-1 (http://www.faqs.org/rfcs/rfc2237.html)
   ISO-IR-165    (http://www.faqs.org/rfcs/rfc1345.html)
   GBK
   VISCII
-  GB 12345      (only plains 1 and 2 available)
-  GB 18030
-  CNS 11643
+  GB 12345
+  GB 18030 (*)  (see links bellow)
+  EUC-TW   (*)
 
 are totally valid encodings but not registered at IANA.
+The names under which they are listed here are probably the
+most widely-known names for these encodings and are recommended
+names.
+
 
-   BIG5PLUS
-   EUC-JP-0212   (Encode::lib::Encode::Tcl::Extended)
 
-are a bit proprietary
+=for comment this used to be listed as supported but
+do not work @15457 when it's clear they will be uncommented 
+or deleted - Anton
+ISO-2022      (http://www.ecma.ch/ecma1/STAND/ECMA-035.HTM)
+ISO-2022-JP-1 (http://www.faqs.org/rfcs/rfc2237.html)
+CNS 11643     (only plains 1 and 2 available)
+
+  BIG5PLUS (*)
+
+is a bit proprietary name. C<(*)>-marked encodings belong to
+C<Encode::HanExtra> available from CPAN.
 
 You may probably get some info on CJK encodings at
 
 brief description for most of the mentioned CJK encodings
-
-F<http://www.debian.org.ru/doc/manuals/intro-i18n/ch-codes.html>
+L<http://www.debian.org.ru/doc/manuals/intro-i18n/ch-codes.html>
 
 several years old, but still useful
-
-F<http://www.oreilly.com/people/authors/lunde/cjk_inf.html>
+L<http://www.oreilly.com/people/authors/lunde/cjk_inf.html>
 
 and some in-depth reading for the heroes :-)
-F<http://www.ecma.ch/ecma1/STAND/ECMA-035.HTM> (eq ISO-2022)
+L<http://www.ecma.ch/ecma1/STAND/ECMA-035.HTM> (eq C<ISO-2022>)
+
+gives brief info on C<EUC-CN>, C<GBK> and mostly on C<GB 18030>
+L<ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/pdf/GB18030_Summary.pdf>
+
+The nature of information in this section is most fragile and
+error-prone; I<probably> is the most popular adverb :)
+Please feel free to send your comments, disagreements and 
+additions to L<...>. (Note however,
+that the mission of this document is to cover the
+C<Encode>-supported encodings only.
 
 =head1 See Also