perl-i18n

Re: ordering Japanese

2006-05-04 21:17:21

On 4 May 2006, at 8:12, Mike Barborak wrote:

Here is an example of 13 store names ordered with
Lingua::JA::Sort::JIS::msort:

1. 伊勢丹 JR京都店
2. アペックス 福山
3. アミュプラザ 鹿児島
4. オクノ 旭川
5. さくら野百貨店 仙台
6. さつま屋 鹿児島
7. スタンス 米子
8. そごう 神戸店
9. そごう 千葉店
10. そごう 大宮店
11. そごう 横浜店
12. ダイアモンドシティアルル 橿原
13. ニューズ 熊本

My client tells me that entry 1 should actually come after the 3rd entry and
before the fourth.

He is right. Usually Japanese words are sorted on the pronunciation (shown between [...] below).

1. [あぺっくす]   アペックス 福山*
2. [あみゅぷらざ]   アミュプラザ 鹿児島
3. [いせたんじぇいあーるきょうとてん] 伊勢丹 JR 京都店
4. [おくの]   オクノ 旭川
5. [さくらのひゃっかてん]   さくら野百貨店 仙台
6. [さつまや]   さつま屋 鹿児島
7. [すたんす]   スタンス 米子
8. [そごうおおみやてん]   そごう 大宮店
9. [そごうこうべてん]   そごう 神戸店
10. [そごうちばてん]  そごう 千葉店
11. [そごうよこはまてん]  そごう 横浜店
12. [だいやもんどしてぃあるる] ダイアモンドシ ティアルル 橿原
13. [にゅーず]  ニューズ 熊本

* If there is another アペックス, e.g. アペックス広島, you have to use [あぺっくすふくやま].

From this description on manyogana, I'm thinking they're saying that collation of the glyph 伊 should be based on its katakana adaptation イ which makes sense:

http://en.wikipedia.org/wiki/Manyogana

I'm not an expert of Japanese language and literature. But as far as modern Japanese is concerned, I think it it inappropriate to associate the pronunciation of a kanji (Chinese letter and pseudo- Chinese letter used in Japanese) to a man'yogana. 伊勢 is a common proper name and pronounced いせ.

3. Is the solution to first convert the manyogana characters to katakana and then do the msort?

Yes.

If so does anyone know of a Perl module to do this or a nice reference that I could use more programmatically than the image on the link above?

I don't know and I'm afraid there's not such a module. To give a pronunciation to all common kanji words would require a large dictionary...

4. Can anyone think of any other glyphs or classes of Japanese glyphs similar to manyogana that I should be worried about?

Romaji -- JR in "JR京都店" in your example.



Kino

☯




<Prev in Thread] Current Thread [Next in Thread>