dankogai(_at_)dan(_dot_)co(_dot_)jp (Dan Kogai) wrote in
news:1E456D1E-E7DE-11D6-BF8B-0003939A104C(_at_)dan(_dot_)co(_dot_)jp:
On Friday, Oct 25, 2002, at 14:10 Asia/Tokyo, Philip Newton wrote:
Well, partially because there's no "good" names for many of the
characters. What do you call "\xA8焉扤扤"? "CJK UNIFIED IDEOGRAPH-751F"?
(That's the current Unicode "name", but it's not particularly useful.)
"CJK shou"? "CJK sei"? "CJK sheng1"? "CJK saeng"? "CJK ikiru"? ikasu,
ikeru, umareru, umu, ou, haeru, hayasu, ki, nama, naru, nasu, musu,
.... which one do you pick?
If we are stuck with de jure, ex officio names from Unicode Consortium
we are out of luck but this is perl; if there are more than one way to
do it, Why not more than one way to name it? I am kind of wondering a
charnames extension that goes like
use charnames ":ja"; # Japanese
print "\N{sei-ikiru}";
#
use charnames ":ko";
print "\N{saeng}";
#
use charanames ":zh";
print "\N{sheng1}";
All ideal for the new aliassing module!
use charnames ":full", ":alias" => "ja";
which is the same as
use charnames ":alias" => ":ja";
Maybe we should supply such alias files, which have no restriction in the
number of aliases to the same long name.
Asuming sei-ikiru's real name is "CHINESE BLUBBER POND WITH FROGS", there
is no problem with
use charnames ":full", ":alias" => {
"sei-ikiru" => "CHINESE BLUBBER POND WITH FROGS",
"saeng" => "CHINESE BLUBBER POND WITH FROGS",
"sheng1" => "CHINESE BLUBBER POND WITH FROGS",
"frog-pond" => "CHINESE BLUBBER POND WITH FROGS",
};
Forgive my ignorance of korean, japanese, chinese and CJK codings in
general. Just pointing out the new welth of possibilities.
Now we can support
unicore/ko_alias.pl
unicore/ja_alias.pl
unicore/zh_alias.pl
...
Since pragmatic approach is rather inflexible, I would prefer OO
aproach, like
use Char::Name;
my $char = Char::Name->new;
print $char->jp("sei-ikiru");
I know Japanese is the biggest nightmare to name characters because in
Japanese we give too many "names" to each character; It's really hard
to disambiguate these....
I may come up with something as I look though Unihan DB, now accessible
via CPAN (Unicode::Unihan)....
Cheers,
Philip Newton (不衣律不入豚)
\x{5c0f}\x{98fc} \x{5f3e}
--
H.Merijn Brand Amsterdam Perl Mongers (http://www.amsterdam.pm.org/)
using perl-5.6.1, 5.7.2 & 630 on HP-UX 10.20 & 11.00, AIX 4.2, AIX 4.3,
WinNT 4, Win2K pro & WinCE 2.11 often with Tk800.022 &/| DBD-Unify
ftp://ftp.funet.fi/pub/languages/perl/CPAN/authors/id/H/HM/HMBRAND/