On Tuesday, May 13, 2003, at 11:48 PM, John Jenkins wrote:
Stroke order, then, is something
different. Seems like we would need order entries in the config
data
for every character, which would be totally unmanageable.
I didn't have any luck searching the Unicode web site for information
about sorting by stroke.
There is a kTotalStrokes field in Unihan.txt, although it doesn't
cover every character in Unihan. This would definitely be a good
place to start.
If you are using Perl 5.6.0 or higher (5.8.0 recommended), you can use
Unicode::Unihan module available via CPAN. Let me show you a small
example.
#!/usr/local/bin/perl
use strict;
use Unicode::Unihan;
my $uh = Unicode::Unihan->new;
my $str = "\x{5c0f}\x{98fc}\x{5f3e}"; # my name in Kanji
my @chars = map {chr($_)} unpack("U*" => $str);
my @strokes = $uh->TotalStrokes($str);
my %c2s; @c2s{(_at_)chars} = @strokes;
binmode STDOUT => ':utf8';
for my $char (sort {$c2s{$a} <=> $c2s{$b} || $a cmp $b} @chars){
print "$char => $c2s{$char}\n";
}
__END__
And here is what it prints.
小 => 3
弾 => 12
飼 => 14
I am not sure if Unicode::Unihan is robust enough for the practical use
but IMHO it is a handy place to start.
Dan the Perl5 Porter