On Mon, 13 Aug 2001 10:07:42 -0500
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> wrote:
On Mon, Aug 13, 2001 at 10:35:32PM +0900, SADAHIRO Tomoyuki wrote:
Hello, everyone.
Sort::UCA 0.04 has been uploaded on CPAN.
snip
To Do: conformance tests of Unicode 3.1.1 Beta
(at present it's DRAFT).
When it passes I will probably grab it in into the 5.8.0-to-be
The result on Perl v5.7.2:
cf. http://www.unicode.org/Public/BETA/UCA/CollationTest.html
Failed Test Stat Wstat Total Fail Failed List of Failed
-------------------------------------------------------------------------------
t/CT_NI.t 95940 2814 29.34% 67795-95940
t/CT_S.t 95940 2814 29.34% 67795-95940
Failed 2/2 test scripts, 0.00% okay. 56292/191880 subtests failed, 70.66% okay.
These failures are due to miscalculation of
weights for unassigned characters (cf. 7.1.2 Legal code points, UTR #10)
in the CollationTest files.
I've reported on it to errata(_at_)unicode(_dot_)org
This is the script used for the above test.
(This is ONLY for your information; NOT a patch to perl.)
##BEGIN##
diff -urN dummy/CT_NI.t t/CT_NI.t
--- dummy/CT_NI.t Thu Jan 01 09:00:00 1970
+++ t/CT_NI.t Wed Aug 15 18:55:38 2001
@@ -0,0 +1,28 @@
+use strict;
+use Test;
+use warnings;
+use Sort::UCA 0.05;
+
+BEGIN { plan tests => 95940 }
+
+open FH, "<CollationTest_NON_IGNORABLE.txt" or die $@;
+my $UCA = Sort::UCA->new( alternate => "non-ignorable" ) or die $@;
+
+my $preKey = "";
+my $preUTF8 = "";
+
+while(<FH>){
+ my($stdKey);
+ chomp;
+ s/(\[.*\])// and $stdKey = $1;
+ my $r = $_;
+ s/;.*//;
+ my @u = Sort::UCA::_getHexArray($_);
+ my $curUTF8 = pack('U*', @u);
+ my $curKey = $UCA->viewSortKey($curUTF8);
+ my $expect = $curKey ne $preKey;
+ my $result = $UCA->cmp($curUTF8, $preUTF8);
+ $preKey = $curKey;
+ $preUTF8 = $curUTF8;
+ ok($result == $expect && $curKey eq $stdKey);
+}
diff -urN dummy/CT_S.t t/CT_S.t
--- dummy/CT_S.t Thu Jan 01 09:00:00 1970
+++ t/CT_S.t Wed Aug 15 18:51:44 2001
@@ -0,0 +1,28 @@
+use strict;
+use warnings;
+use Test;
+use Sort::UCA 0.05;
+
+BEGIN { plan tests => 95940 }
+
+open FH, "<CollationTest_SHIFTED.txt" or die $@;
+my $UCA = Sort::UCA->new( ) or die $@;
+
+my $preKey = "";
+my $preUTF8 = "";
+
+while(<FH>){
+ my($stdKey);
+ chomp;
+ s/(\[.*\])// and $stdKey = $1;
+ my $r = $_;
+ s/;.*//;
+ my @u = Sort::UCA::_getHexArray($_);
+ my $curUTF8 = pack('U*', @u);
+ my $curKey = $UCA->viewSortKey($curUTF8);
+ my $expect = $curKey ne $preKey;
+ my $result = $UCA->cmp($curUTF8, $preUTF8);
+ $preKey = $curKey;
+ $preUTF8 = $curUTF8;
+ ok($result == $expect && $curKey eq $stdKey);
+}
##END##
(how about Unicode::Sort as the name?)
Unicode::Sort is also good, but,
Unicode::Collate might consist better with Unicode::Normalize.
cf. Unicode Normalization Forms => Unicode::Normalize
Unicode Collation Algorithm => Unicode::Collate
Regards, SADAHIRO Tomoyuki