"Jarkko" == Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> writes:
However the problem with this is that $casefold comes back as undef.
Jarkko> Because the U+09DC has no 'special folding' (nor 'special
Jarkko> casing', for that matter). It has only the 'usual' cases
Jarkko> (which can be retrieved using charinfo()).
OK, thank you.
Another suggestion is therefore that the examples in the docs be
changed to use codepoints that do have special folding or casing.
Here's what I see now. I grepped a few values out of
CaseFolding-3.2.0.txt that seem to have case folding:
$ grep ' F;' CaseFolding-3.2.0.txt | head | cut -f1 -d\;
00DF
0130
0149
01F0
0390
03B0
0587
1E96
1E97
1E98
and I put these into a short script:
#!/usr/bin/perl -w
use strict;
use Unicode::UCD qw(casespec casefold charinfo);
foreach my $cp qw(00DF 0130 0149 01F0 0390 03B0 0587 1E96 1E97 1E98){
my $info = charinfo(hex($cp));
die "$0: $cp has no charinfo.\n" unless defined $info;
printf "U+$cp: %-53s fold=%d, spec=%d\n",
$info->{name},
defined casefold($cp) ? 1 : 0,
defined casespec($cp) ? 1 : 0;
}
and expected to see that casefold (at least) gave a defined value for
each. But instead I see the following output:
U+00DF: LATIN SMALL LETTER SHARP S fold=1, spec=1
U+0130: LATIN CAPITAL LETTER I WITH DOT ABOVE fold=0, spec=0
U+0149: LATIN SMALL LETTER N PRECEDED BY APOSTROPHE fold=0, spec=0
U+01F0: LATIN SMALL LETTER J WITH CARON fold=1, spec=1
U+0390: GREEK SMALL LETTER IOTA WITH DIALYTIKA AND TONOS fold=0, spec=0
U+03B0: GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS fold=1, spec=1
U+0587: ARMENIAN SMALL LIGATURE ECH YIWN fold=0, spec=0
U+1E96: LATIN SMALL LETTER H WITH LINE BELOW fold=1, spec=1
U+1E97: LATIN SMALL LETTER T WITH DIAERESIS fold=1, spec=1
U+1E98: LATIN SMALL LETTER W WITH RING ABOVE fold=1, spec=1
Is there anything wrong here? If not, I guess there's something pretty
fundamental going here that I don't understand. Why would U+00DF have
folding information but U+0149 not have it?
Regards,
Terry.