On Saturday, May 17, 2003, at 03:18 AM, Dan Kogai wrote:
Whole module right after my signature. Based upon Unicode::String
"some" body please fill in the POD section. Test suite is even more
welcome.
Dan the Encode Maintainer
I found that we can't use \s in $re_asis because \s matches U+3000
(IDEOGRAPHIC SPACE), which needs to be encoded. A few RFC readings
later, I concluded that we should use \x00-\x20, all ASCII controls
plus white space. Here is the patch.
===================================================================
RCS file: lib/Encode/Unicode/UTF7.pm,v
retrieving revision 0.1
diff -u -r0.1 lib/Encode/Unicode/UTF7.pm
--- lib/Encode/Unicode/UTF7.pm 2003/05/16 18:06:24 0.1
+++ lib/Encode/Unicode/UTF7.pm 2003/05/17 14:21:21
@@ -18,8 +18,10 @@
my $specials = quotemeta "\'(),-.:?";
$OPTIONAL_DIRECT_CHARS and
$specials .= quotemeta "!\"#$%&*;<=>@[]^_`{|}";
-my $re_asis = qr/(?:[\sA-Za-z0-9$specials])/;
-my $re_encoded = qr/(?:[^\sA-Za-z0-9$specials])/;
+# \s will not work because it matches U+3000 DEOGRAPHIC SPACE
+# We use \x00-\x20 instead (controls + space)
+my $re_asis = qr/(?:[\x00-\x20A-Za-z0-9$specials])/;
+my $re_encoded = qr/(?:[^\x00-\x20A-Za-z0-9$specials])/;
my $e_utf16 = find_encoding("UTF-16BE");
sub needs_lines { 1 };
Since this is derived from Unicode::String->utf7(), I am mailing this
also to Gisle so the corresponding part in Unicode::String can be fixed.
From Unicode::String
208: if (($UTF7_OPTIONAL_DIRECT_CHARS &&
209: $$self =~
/\G((?:\0[A-Za-z0-9\'\(\)\,\-\.\/\:\?\!\"\#\$\%\&\*\;
\<\=\>\(_at_)\[\]\^\_\`\{\|\}\s])+)/gc)
210: || $$self =~
/\G((?:\0[A-Za-z0-9\'\(\)\,\-\.\/\:\?\s])+)/gc)
[snip]
215: elsif (($UTF7_OPTIONAL_DIRECT_CHARS &&
216: $$self =~
/\G((?:[^\0].|\0[^A-Za-z0-9\'\(\)\,\-\.\/\:\?
\!\"\#\$\%\&\*\;\<\=\>\(_at_)\[\]\^\_\`\{\|\}\s])+)/gsc)
217: || $$self =~
/\G((?:[^\0].|\0[^A-Za-z0-9\'\(\)\,\-\.\/\:
\?\s])+)/gsc)
In the era of Unicode, beware of \s.
Dan the Encode Maintainer