perl-unicode

Re: real UTF-8 vs. utf8n_to_uvuni()

2004-12-06 05:30:08
On Sun, Dec 05, 2004 at 11:58:54AM +0900, Dan Kogai wrote:

Sine Gisle's patch make use of utf8n_to_uvuni(), it seems to be a 
problem of perl core.  So I have checked utf8.c which defines that.  
Seems like it does not make use of PERL_UNICODE_MAX.

The patch against utf8.c fixes that.

But breaks 2 core tests, t/op/tr.t and ext/Unicode/Normalize/t/illegal.t

--- perl-5.8.x/utf8.c   Wed Nov 17 23:11:04 2004
+++ perl-5.8.x.dan/utf8.c       Sun Dec  5 11:38:52 2004
@@ -429,6 +429,13 @@
        }
        else
            uv = UTF8_ACCUMULATE(uv, *s);
+       /* Checks if ord() > 0x10FFFF -- dankogai */
+       if (uv > PERL_UNICODE_MAX){
+           if (!(flags & UTF8_ALLOW_LONG)) {
+               warning = UTF8_WARN_LONG;
+               goto malformed;
+           }
+       }
        if (!(uv > ouv)) {
            /* These cannot be allowed. */
            if (uv == ouv) {

(this is utf8 mangled by an 8 bit terminal)

not ok 54 - translit w/complement
# Failed at t/op/tr.t line 229
Wide character in print at ./test.pl line 48.
#      got 'ĬÃ
               ĭĬÃ
                    Ä­'
Wide character in print at ./test.pl line 48.
# expected 'Ä­Ã
               Ä­Ä­Ã
                    Ä­'
ok 55
ok 56 - translit w/deletion
ok 57
ok 58 - translit w/squeeze
ok 59
ok 60
ok 61
ok 62
ok 63 - UTF range
ok 64
ok 65
ok 66
ok 67
ok 68
not ok 69
# Failed at t/op/tr.t line 288
Wide character in print at ./test.pl line 48.
#      got 'È'
# expected 'X'
not ok 70
# Failed at t/op/tr.t line 291
Wide character in print at ./test.pl line 48.
#      got 'È'
# expected 'X'


and

not ok 91
# Failed test 91 in ext/Unicode/Normalize/t/illegal.t at line 53 fail #10
not ok 92
# Failed test 92 in ext/Unicode/Normalize/t/illegal.t at line 54 fail #10
not ok 93
# Failed test 93 in ext/Unicode/Normalize/t/illegal.t at line 55 fail #10
not ok 94
# Failed test 94 in ext/Unicode/Normalize/t/illegal.t at line 56 fail #10
ok 95
not ok 96
# Failed test 96 in ext/Unicode/Normalize/t/illegal.t at line 58 fail #10
not ok 97
# Failed test 97 in ext/Unicode/Normalize/t/illegal.t at line 59 fail #10
not ok 98
# Failed test 98 in ext/Unicode/Normalize/t/illegal.t at line 60 fail #10
not ok 99
# Failed test 99 in ext/Unicode/Normalize/t/illegal.t at line 61 fail #10
not ok 100
# Failed test 100 in ext/Unicode/Normalize/t/illegal.t at line 62 fail #10
not ok 101
# Failed test 101 in ext/Unicode/Normalize/t/illegal.t at line 53 fail #11
not ok 102
# Failed test 102 in ext/Unicode/Normalize/t/illegal.t at line 54 fail #11
not ok 103
# Failed test 103 in ext/Unicode/Normalize/t/illegal.t at line 55 fail #11
not ok 104
# Failed test 104 in ext/Unicode/Normalize/t/illegal.t at line 56 fail #11
ok 105
not ok 106
# Failed test 106 in ext/Unicode/Normalize/t/illegal.t at line 58 fail #11
not ok 107
# Failed test 107 in ext/Unicode/Normalize/t/illegal.t at line 59 fail #11
not ok 108
# Failed test 108 in ext/Unicode/Normalize/t/illegal.t at line 60 fail #11
not ok 109
# Failed test 109 in ext/Unicode/Normalize/t/illegal.t at line 61 fail #11
not ok 110
# Failed test 110 in ext/Unicode/Normalize/t/illegal.t at line 62 fail #11
ok 111
ok 112

I don't know what is at fault here, the tests, or the patch.

Nicholas Clark