perl-unicode

Re: Unicode / Japanese and Transliteration problem

2002-01-31 05:44:16
On Thu, 31 Jan 2002 12:31:58 +0000
Jean-Michel Hiver <jhiver(_at_)mkdoc(_dot_)com> wrote:

  However I'm having quite a lot of trouble with Japanese because of
  Kanjis (chinese ideograms).  ICU does provide Hiragana <=> Latin and
  Katakana <=> Latin, but doesn't do anything about kanji. Which does
  not surprise me too much given the fact that in Japanese a kanji has
  very often more than one pronunciation depending on how and where it's
  used.

  Another problem with Japanese is that it seems to me that words are
  not separated by spaces.  Therefore even if the transliteration worked
  for Kanjis I'd end up with lots of endless strings, which is not good
  for indexing when you try to split text into keywords.

Any ideas? 

Try kakasi or Chasen. They can be accessed via Perl with XS
wrapper.

kakasi
http://kakasi.namazu.org/
Chasen
http://chasen.aist-nara.ac.jp/
namazu
http://namazu.org/


--
Tatsuhiko Miyagawa <miyagawa(_at_)bulknews(_dot_)net>

<Prev in Thread] Current Thread [Next in Thread>