Regular maintainence work. :-)
Thanks,
/Autrijus/
--- encoding.pm.orig Sat Mar 13 21:46:21 2004
+++ encoding.pm Sat Mar 13 22:02:58 2004
@@ -158,11 +158,11 @@
=item *
Changing PerlIO layers of C<STDIN> and C<STDOUT> to the encoding
- specified.
+specified.
=back
-=head2 Literal Conversions
+=head2 Literal conversions
You can write code in EUC-JP as follows:
@@ -246,9 +246,9 @@
Sets the script encoding to I<ENCNAME>. And unless ${^UNICODE}
exists and non-zero, PerlIO layers of STDIN and STDOUT are set to
-":encoding(I<ENCNAME>)".
+C<:encoding(I<ENCNAME>)>.
-Note that STDERR WILL NOT be changed.
+Note that STDERR will I<not> be changed.
Also note that non-STD file handles remain unaffected. Use C<use
open> or C<binmode> to change layers of those.
@@ -279,7 +279,7 @@
=item no encoding;
Unsets the script encoding. The layers of STDIN, STDOUT are
-reset to ":raw" (the default unprocessed raw stream of bytes).
+reset to C<:raw> (the default unprocessed raw stream of bytes).
=back
@@ -291,7 +291,7 @@
in UTF-8 -- or use a source filter. That's what 'Filter=>1' does.
What does this mean? Your source code behaves as if it is written in
-UTF-8 with 'use utf8' in effect. So even if your editor only supports
+UTF-8 with C<use utf8> in effect. So even if your editor only supports
Shift_JIS, for example, you can still try examples in Chapter 15 of
C<Programming Perl, 3rd Ed.>. For instance, you can use UTF-8
identifiers.
@@ -327,12 +327,12 @@
B<use encoding> can appear as many times as you want in a given script.
The multiple use of this pragma is discouraged.
-By the same reason, the use this pragma inside modules is also
-discouraged (though not as strongly discouranged as the case above.
-See below).
+By the same reason, the use of this pragma inside modules is also
+discouraged, although not as strongly discouranged as the case above
+(see below).
If you still have to write a module with this pragma, be very careful
-of the load order. See the codes below;
+of the load order. A common mistake is shown below:
# called module
package Module_IN_BAR;
@@ -345,16 +345,16 @@
use Module_IN_BAR;
# surprise! use encoding "bar" is in effect.
-The best way to avoid this oddity is to use this pragma RIGHT AFTER
-other modules are loaded. i.e.
+The best way to avoid this oddity is to use this pragma I<right after>
+other modules are loaded, like this:
use Module_IN_BAR;
use encoding "foo";
=head2 DO NOT MIX MULTIPLE ENCODINGS
-Notice that only literals (string or regular expression) having only
-legacy code points are affected: if you mix data like this
+This pragma only affects literals (string or regular expression) composed
+solely of legacy code points. If you mix data like this:
\xDF\x{100}
@@ -363,39 +363,39 @@
"\xDF" =~ /\x{3af}/
-but this will not
+but this will not:
"\xDF\x{100}" =~ /\x{3af}\x{100}/
-since the C<\xDF> (ISO 8859-7 GREEK SMALL LETTER IOTA WITH TONOS) on
-the left will B<not> be upgraded to C<\x{3af}> (Unicode GREEK SMALL
-LETTER IOTA WITH TONOS) because of the C<\x{100}> on the left. You
-should not be mixing your legacy data and Unicode in the same string.
+Because of the C<\x{100}> on the right side, C<\xDF> (ISO 8859-7 GREEK
+SMALL LETTER IOTA WITH TONOS) on the left will B<not> be upgraded to
+C<\x{3af}> (Unicode GREEK SMALL LETTER IOTA WITH TONOS). You should
+not be mixing your legacy data and Unicode in the same string.
-This pragma also affects encoding of the 0x80..0xFF code point range:
-normally characters in that range are left as eight-bit bytes (unless
+This pragma also affects encoding of the C<0x80>..C<0xFF> code point range.
+Characters in that range are normally left as eight-bit bytes (unless
they are combined with characters with code points 0x100 or larger,
-in which case all characters need to become UTF-8 encoded), but if
-the C<encoding> pragma is present, even the 0x80..0xFF range always
-gets UTF-8 encoded.
+in which case all characters will be upgraded to unicode), but if
+the C<encoding> pragma is present, code points in the C<0x80>..C<0xFF>
+will always be decoded into unicode strings, with the specifed encoding.
After all, the best thing about this pragma is that you don't have to
-resort to \x{....} just to spell your name in a native encoding.
+resort to C<\x{....}> just to spell your name in a native encoding.
So feel free to put your strings in your encoding in quotes and
regexes.
=head2 tr/// with ranges
The B<encoding> pragma works by decoding string literals in
-C<q//,qq//,qr//,qw///, qx//> and so forth. In perl 5.8.0, this
-does not apply to C<tr///>. Therefore,
+C<q//>, C<qq//>, C<qr//>, C<qw//>, C<qx//> and so forth. In perl
+5.8.0, this did not apply to C<tr///>. Therefore,
use encoding 'euc-jp';
#....
$kana =~ tr/\xA4\xA1-\xA4\xF3/\xA5\xA1-\xA5\xF3/;
# -------- -------- -------- --------
-Does not work as
+did not work as
$kana =~ tr/\x{3041}-\x{3093}/\x{30a1}-\x{30f3}/;
@@ -414,7 +414,7 @@
This counterintuitive behavior has been fixed in perl 5.8.1.
-=head3 workaround to tr///;
+=head3 Workaround to tr///;
In perl 5.8.0, you can work around as follows;
@@ -469,7 +469,7 @@
=over
-=item literals in regex that are longer than 127 bytes
+=item Literals in regex that are longer than 127 bytes
For native multibyte encodings (either fixed or variable length),
the current implementation of the regular expressions may introduce
@@ -481,7 +481,7 @@
(Porters who are willing and able to remove this limitation are
welcome.)
-=item format
+=item Format
This pragma doesn't work well with format because PerlIO does not
get along very well with it. When format contains non-ascii
pgpZPOO3JGR8Z.pgp
Description: PGP signature