On Dec 05, 2004, at 10:56, Dan Kogai wrote:
Thanks, applied in my repository. New tests and documentation fix in
progress. When I am done w/ that, I will release Encode-2.0901 on my
web (not CPAN yet). When cross-checks by porters are done I will
release Encode-2.10.
Dan the Encode Maintainer
Now I am writing test suites and found some of the strictures are
missing.
Surrogate -- OK
% perl -Mblib -MEncode -le '$a="\x{d801}"; print encode("UTF-8", $a, 1)'
"\x{d801}" does not map to utf8 at
/gs1/dankogai/work/Encode/blib/lib/Encode.pm line 150.
U+FFFF -- OK
% perl -Mblib -MEncode -le '$a="\x{ffff}"; print encode("UTF-8", $a, 1)'
"\x{ffff}" does not map to utf8 at
/gs1/dankogai/work/Encode/blib/lib/Encode.pm line 150.
Chars above U+10FFFF -- NOT OK
%> perl -Mblib -MEncode -le '$a="\x{11ffff}"; print encode("UTF-8", $a,
1)'
????
Sine Gisle's patch make use of utf8n_to_uvuni(), it seems to be a
problem of perl core. So I have checked utf8.c which defines that.
Seems like it does not make use of PERL_UNICODE_MAX.
The patch against utf8.c fixes that.
> ~/danperl/bin/perl5.8.6 -Mblib -MEncode -le '$a="\x{11FFFF}"; print
encode("UTF-8", $a, 1)'
"\x{00f4}" does not map to utf8 at
/gs1/dankogai/work/Encode/blib/lib/Encode.pm line 150.
As you see, the warning is still funny. But for any case w/
UTF8_WARN_LONG is funny as follows;
> perl -Mblib -MEncode -le '$a="\x{7fff_ffff}"; print encode("UTF-8",
$a, 1)'
??????
> perl -Mblib -MEncode -le '$a="\x{8000_0000}"; print encode("UTF-8",
$a, 1)'
"\x{00fe}" does not map to utf8 at
/gs1/dankogai/work/Encode/blib/lib/Encode.pm line 150.
I have tracked down and found this warning was handled by Encode so
Gisle and I can fix that.
Dan the Encode Maintainer
--- perl-5.8.x/utf8.c Wed Nov 17 23:11:04 2004
+++ perl-5.8.x.dan/utf8.c Sun Dec 5 11:38:52 2004
@@ -429,6 +429,13 @@
}
else
uv = UTF8_ACCUMULATE(uv, *s);
+ /* Checks if ord() > 0x10FFFF -- dankogai */
+ if (uv > PERL_UNICODE_MAX){
+ if (!(flags & UTF8_ALLOW_LONG)) {
+ warning = UTF8_WARN_LONG;
+ goto malformed;
+ }
+ }
if (!(uv > ouv)) {
/* These cannot be allowed. */
if (uv == ouv) {