On Sat, 10 May 2003 22:45:52 +0200
terry(_at_)eatoni(_dot_)com (terry jones) wrote:
and I put these into a short script:
#!/usr/bin/perl -w
use strict;
use Unicode::UCD qw(casespec casefold charinfo);
foreach my $cp qw(00DF 0130 0149 01F0 0390 03B0 0587 1E96 1E97 1E98){
my $info = charinfo(hex($cp));
die "$0: $cp has no charinfo.\n" unless defined $info;
printf "U+$cp: %-53s fold=%d, spec=%d\n",
$info->{name},
defined casefold($cp) ? 1 : 0,
defined casespec($cp) ? 1 : 0;
}
and expected to see that casefold (at least) gave a defined value for
each. But instead I see the following output:
U+00DF: LATIN SMALL LETTER SHARP S fold=1, spec=1
U+0130: LATIN CAPITAL LETTER I WITH DOT ABOVE fold=0, spec=0
U+0149: LATIN SMALL LETTER N PRECEDED BY APOSTROPHE fold=0, spec=0
U+01F0: LATIN SMALL LETTER J WITH CARON fold=1, spec=1
U+0390: GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS fold=0, spec=0
U+03B0: GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS fold=1, spec=1
U+0587: ARMENIAN SMALL LIGATURE ECH YIWN fold=0, spec=0
U+1E96: LATIN SMALL LETTER H WITH LINE BELOW fold=1, spec=1
U+1E97: LATIN SMALL LETTER T WITH DIAERESIS fold=1, spec=1
U+1E98: LATIN SMALL LETTER W WITH RING ABOVE fold=1, spec=1
Is there anything wrong here? If not, I guess there's something pretty
fundamental going here that I don't understand. Why would U+00DF have
folding information but U+0149 not have it?
'0130', '0149', '0390', '0587' match /^\d+$/, and others don't.
Try hex($cp) or "U+$cp" or "0x$cp" instead $cp.
cf.
http://www.perldoc.com/perl5.8.0/lib/Unicode/UCD.html#Code-Point-Arguments
I don't think this behavior of _getcode() would be consistent
enough, though.
SADAHIRO Tomoyuki