[Encode] UCS/UTF mess and Surrogate Handlings

On Friday, April 5, 2002, at 11:10 , Jarkko Hietaniemi wrote:

Change 15745 by jhi(_at_)alpha on 2002/04/05 13:07:21

        Integrate perlio;
        
        Not only did UCS-2 have dodgy name it was buggy.

Affected files ...

... //depot/perl/ext/Encode/lib/Encode/10646_1.pm#4 integrate

Differences ...


I've just ci'd 1.21 before I got this.   Hell.  1.22 that is.

-__PACKAGE__->Define(qw(UCS-2));
+__PACKAGE__->Define(qw(UCS-2BE UCS-2));


This one was done (with UCS-2 relocated to Alias.pm)

@@ -30,7 +30,7 @@
     {
        my $ch = substr($uni,0,1,'');
        my $x  = ord($ch);
-       unless ($x < 32768)
+       unless ($x <= 0xffff)
        {
            last if ($chk);
            $x = 0;
End of Patch.

I have reviewed the code following this and found this is *really*UCS-2BE, not UTF-16 in a sense it does not handle surrogates (encode()simply croaks for chars above BMP). Internally perl does support0x10000 and above so why not support UTF-16 AND UCS-2 CORRECTLY andDISTICTIVELY? I also found that UTF-32 is missing (well, no one yetuses it but it is well-stated by Unicode Consortium). I'll clean up theUCS/UTF mess. It won't take much time.


Oh, the same bug was there in UCS-2LE.

Dan the Encode Maintainer

P.S. Does utf8 support surrogates? Surrogate pair is definitely theugliest SOB of Unicode but without it, we can't print\x{8000}-\x{10ffffff} to the stream....

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re: [Encode] Endian consistency and missing raw encodings for TK, Jarkko Hietaniemi

Next by Date:

Re: [Encode] UCS/UTF mess and Surrogate Handlings, Jarkko Hietaniemi

Previous by Thread:

[Encode] Endian consistency and missing raw encodings for TK, Dan Kogai

Next by Thread:

Re: [Encode] UCS/UTF mess and Surrogate Handlings, Jarkko Hietaniemi

Indexes:

[Date] [Thread] [Top] [All Lists]