On Friday, April 5, 2002, at 11:10 , Jarkko Hietaniemi wrote:
Change 15745 by jhi(_at_)alpha on 2002/04/05 13:07:21
Integrate perlio;
Not only did UCS-2 have dodgy name it was buggy.
Affected files ...
... //depot/perl/ext/Encode/lib/Encode/10646_1.pm#4 integrate
Differences ...
I've just ci'd 1.21 before I got this. Hell. 1.22 that is.
-__PACKAGE__->Define(qw(UCS-2));
+__PACKAGE__->Define(qw(UCS-2BE UCS-2));
This one was done (with UCS-2 relocated to Alias.pm)
@@ -30,7 +30,7 @@
{
my $ch = substr($uni,0,1,'');
my $x = ord($ch);
- unless ($x < 32768)
+ unless ($x <= 0xffff)
{
last if ($chk);
$x = 0;
End of Patch.
I have reviewed the code following this and found this is *really*
UCS-2BE, not UTF-16 in a sense it does not handle surrogates (encode()
simply croaks for chars above BMP). Internally perl does support
0x10000 and above so why not support UTF-16 AND UCS-2 CORRECTLY and
DISTICTIVELY? I also found that UTF-32 is missing (well, no one yet
uses it but it is well-stated by Unicode Consortium). I'll clean up the
UCS/UTF mess. It won't take much time.
Oh, the same bug was there in UCS-2LE.
Dan the Encode Maintainer
P.S. Does utf8 support surrogates? Surrogate pair is definitely the
ugliest SOB of Unicode but without it, we can't print
\x{8000}-\x{10ffffff} to the stream....