perl-unicode

Re: Unicode. Perl does the right thing?

2002-10-26 05:30:05
dankogai(_at_)dan(_dot_)co(_dot_)jp (Dan Kogai) wrote in
news:1E456D1E-E7DE-11D6-BF8B-0003939A104C(_at_)dan(_dot_)co(_dot_)jp: 

On Friday, Oct 25, 2002, at 14:10 Asia/Tokyo, Philip Newton wrote:
Well, partially because there's no "good" names for many of the
characters. What do you call "\xA8焉扤扤"? "CJK UNIFIED IDEOGRAPH-751F"?
(That's the current Unicode "name", but it's not particularly useful.)
"CJK shou"? "CJK sei"? "CJK sheng1"? "CJK saeng"? "CJK ikiru"? ikasu,
ikeru, umareru, umu, ou, haeru, hayasu, ki, nama, naru, nasu, musu,
.... which one do you pick?

If we are stuck with de jure, ex officio names from Unicode Consortium 
we are out of luck but this is perl; if there are more than one way to 
do it,  Why not more than one way to name it?  I am kind of wondering a 
charnames extension that goes like

use charnames ":ja"; # Japanese
print "\N{sei-ikiru}";
#
use charnames ":ko";
print "\N{saeng}";
#
use charanames ":zh";
print "\N{sheng1}";

All ideal for the new aliassing module!

use charnames ":full", ":alias" => "ja";

which is the same as

use charnames ":alias" => ":ja";

Maybe we should supply such alias files, which have no restriction in the 
number of aliases to the same long name.

Asuming sei-ikiru's real name is "CHINESE BLUBBER POND WITH FROGS", there 
is no problem with

use charnames ":full", ":alias" => {
        "sei-ikiru" => "CHINESE BLUBBER POND WITH FROGS",
        "saeng"     => "CHINESE BLUBBER POND WITH FROGS",
        "sheng1"    => "CHINESE BLUBBER POND WITH FROGS",
        "frog-pond" => "CHINESE BLUBBER POND WITH FROGS",
      };

Forgive my ignorance of korean, japanese, chinese and CJK codings in 
general. Just pointing out the new welth of possibilities.

Now we can support

unicore/ko_alias.pl
unicore/ja_alias.pl
unicore/zh_alias.pl
...

Since pragmatic approach is rather inflexible, I would prefer OO 
aproach, like

use Char::Name;

my $char = Char::Name->new;

print $char->jp("sei-ikiru");

I know Japanese is the biggest nightmare to name characters because in 
Japanese we give too many "names" to each character; It's really hard 
to disambiguate these....

I may come up with something as I look though Unihan DB, now accessible 
via CPAN (Unicode::Unihan)....

Cheers,
Philip Newton (不衣律不入豚)

\x{5c0f}\x{98fc} \x{5f3e}





-- 
H.Merijn Brand    Amsterdam Perl Mongers (http://www.amsterdam.pm.org/)
using perl-5.6.1, 5.7.2 & 630 on HP-UX 10.20 & 11.00, AIX 4.2, AIX 4.3,
     WinNT 4, Win2K pro & WinCE 2.11 often with Tk800.022 &/| DBD-Unify
ftp://ftp.funet.fi/pub/languages/perl/CPAN/authors/id/H/HM/HMBRAND/