perl-unicode

Re: Unicode vs. \s [Was: Re: Encode::Unicode::UTF7]

2003-05-17 20:30:04
On Sunday, May 18, 2003, at 11:25  AM, SADAHIRO Tomoyuki wrote:
\s => qr/[\n\r\t\ ]/x;
It's a bad idea to use \x00-\x20.

cf. RFC 2152 (UTF-7) says

      Rule 3: The space (decimal 32), tab (decimal 9), carriage return
      (decimal 13), and line feed (decimal 10) characters may be
      directly represented by their ASCII equivalents. However, note
that MIME content transfer encodings have rules concerning the use
      of such characters. Usage that does not conform to the
      restrictions of RFC 822, for example, would have to be encoded
      using MIME content transfer encodings other than 7bit or 8bit,
      such as quoted-printable, binary, or base64.

Acknowledged.

$specials missing '/'.

Same here.

In the case of Unicode::String,
a referent in $self is encoded in UCS-2.
So chr(0x3000) never occurs.

I see.

Thanks for linting.  Patch follows.

Dan the Encode Maintainer

--- ../perl-current/ext/Encode/lib/Encode/Unicode/UTF7.pm Sun May 18 02:57:07 2003
+++ lib/Encode/Unicode/UTF7.pm  Sun May 18 11:52:58 2003
@@ -15,13 +15,13 @@
 #

 our $OPTIONAL_DIRECT_CHARS = 1;
-my $specials =   quotemeta "\'(),-.:?";
+my $specials =   quotemeta "\'(),-./:?";
 $OPTIONAL_DIRECT_CHARS and
     $specials .= quotemeta "!\"#$%&*;<=>@[]^_`{|}";
 # \s will not work because it matches U+3000 DEOGRAPHIC SPACE
-# We use \x00-\x20 instead (controls + space)
-my $re_asis =     qr/(?:[\x00-\x20A-Za-z0-9$specials])/;
-my $re_encoded = qr/(?:[^\x00-\x20A-Za-z0-9$specials])/;
+# We use qr/[\n\r\t\ ] instead
+my $re_asis =     qr/(?:[\n\r\t\ A-Za-z0-9$specials])/;
+my $re_encoded = qr/(?:[^\n\r\t\ A-Za-z0-9$specials])/;
 my $e_utf16 = find_encoding("UTF-16BE");

 sub needs_lines { 1 };

<Prev in Thread] Current Thread [Next in Thread>