perl-unicode

Re: Announce: Perl, Unicode and I18N FAQ

1999-12-17 15:58:29

James Briggs wrote,

Guys:

Please review this FAQ and reply with your comments.

Happy Holidays,

James.

http://rf.net/~james/perli18n.html

Here are some suggestions in the form of a uni-diff patch (note that I've
introduced some line breaks).

BTW, I think there was a perli18n.pod doc distributed with an earlier
version of perl.

Peter Prymmer

--- perli18n.html.orig  Fri Dec 17 14:15:27 1999
+++ perli18n.html       Fri Dec 17 14:43:13 1999
@@ -72,7 +72,14 @@
 <a name="Q1"></a>
 <b>Q1. I think that I'm a clever programmer. What's so hard about 
internationalization?</b>
 <p>
-A1. Internationalizing a product involves issues about program design, 
application language features, cultural practices, fonts and often legacy 
clients. Most programmers face a rude awakening when first internationalizing 
an application after a career of only ASCII. Little details often become big 
headaches.  The most important thing you can do is start your international 
design early.
+A1. Internationalizing a product involves issues about program design, 
+application language features, cultural practices, fonts and often legacy 
+clients. Most programmers face a rude awakening when first internationalizing 
+an application after a career of only ASCII.  Little details often become big 
+headaches.  For example, a lot of code gets written with the assumption that 
+there are only twelve months per year.  This assumption does not hold true 
+everywhere.  The most important thing you can do is start your international 
+design early.
 <p>
 For a typical tale of woe, see:
 <a 
href="http://www-4.ibm.com/software/developer/library/internationalization-support.html";>http://www-4.ibm.com/software/developer/library/internationalization-support.html</a>
@@ -346,7 +353,8 @@
 <p>
 A9. Yes, see perldoc perlre.
 <p>
-In short, \w is locale-dependent.
+In short, \w and \s (along with their converses \W and \S respectively) are 
+locale-dependent.
 <p>
 <a name="Q10"></a>
 <b>Q10. Do regular expressions work with Unicode?</b>
@@ -426,7 +434,7 @@
     print $no_map->tou("V}re norske tegn b|r {res\n")->utf8;
 </pre>
 <p>
-Gisele Aas wrote Unicode::Map8.
+Gisle Aas wrote Unicode::Map8.
 <p>
 <b>Unicode::String</b>
 <p>
@@ -446,7 +454,7 @@
     print "Latin1: ", $u->latin1, "\n"; # lossy
     print "Hex: ",    $u->hex,    "\n"; # a hexadecimal string
 </pre>
-Gisele Aas wrote Unicode::String.
+Gisle Aas wrote Unicode::String.
 <p>
 <b>I18N::Collate</b>
 <p>
@@ -507,7 +515,7 @@
 See Frank Tang's site for some UTF-8 auto detection scripts:
 <a 
href="http://people.netscape.com/ftang/i18n.html#detect";>http://people.netscape.com/ftang/i18n.html#detect</a>
 <p>
-See Ken Lunde's CJKV book for some character encoding detection regular 
expressions.
+See Ken Lunde's CJKV book for some character encoding detection regular 
expressions in perl.
 <p>
 <a name="Q17"></a>
 <b>Q17. Is Unicode big endian or little endian?</b>
@@ -518,13 +526,16 @@
 <b>Q18. Is there an EBCDIC-safe transformation of Unicode?</b>
 <p>
 A18. Yes. UTF-EBCDIC stands for EBCDIC-friendly Unicode (or UCS) 
Transformation Format.
-See Unicode Technical Report #16. <a 
href="http://www.unicode.org/unicode/reports/tr16/";>http://www.unicode.org/unicode/reports/tr16/</a>
+See Unicode Technical Report #16. <a 
href="http://www.unicode.org/unicode/reports/tr16/";>http://www.unicode.org/unicode/reports/tr16/</a>.
+Unfortunately there is no Perl implementation of the proposed UTF-EBCDIC
+transform as of version 5.005_63.  Also, the <CODE>use utf8;</CODE>
+pragma is unlikely to be very useful on EBCDIC Perls.
 <p>
 Where to Use UTF-EBCDIC?
 <p>
 UTF-EBCDIC is intended to be used inside EBCDIC systems or in closed networks 
where there is a dependency on EBCDIC hard-coding assumptions. It is not meant 
to be used for open interchange among heterogeneous platforms using different 
data encodings. Due to specific requirements for ASCII encoding for line 
endings in some Internet protocols, UTF-EBCDIC is unsuitable for use over the 
Internet using such protocols. UTF-8 or UTF-16 forms should be used in open 
interchange.
 <p>
-See the Perl EBCDIC FAQ at <a 
href="http://www.best.com/~pvhp/os390/doc/perlebcdic.pod.txt";>http://www.best.com/~pvhp/os390/doc/perlebcdic.pod.txt</a>
+See the Perl EBCDIC document at <a 
href="http://www.best.com/~pvhp/os390/doc/perlebcdic.pod.txt";>http://www.best.com/~pvhp/os390/doc/perlebcdic.pod.txt</a>
 <p>
 <a name="Q19"></a>
 <b>Q19. Are there security implications in I18n?</b>
End of Patch.