On 4 May 2006, at 8:12, Mike Barborak wrote:
Here is an example of 13 store names ordered with
Lingua::JA::Sort::JIS::msort:
1. 伊勢丹 JR京都店
2. アペックス 福山
3. アミュプラザ 鹿児島
4. オクノ 旭川
5. さくら野百貨店 仙台
6. さつま屋 鹿児島
7. スタンス 米子
8. そごう 神戸店
9. そごう 千葉店
10. そごう 大宮店
11. そごう 横浜店
12. ダイアモンドシティアルル 橿原
13. ニューズ 熊本
My client tells me that entry 1 should actually come after the 3rd
entry and
before the fourth.
He is right. Usually Japanese words are sorted on the pronunciation
(shown between [...] below).
1. [あぺっくす] アペックス 福山*
2. [あみゅぷらざ] アミュプラザ 鹿児島
3. [いせたんじぇいあーるきょうとてん] 伊勢丹 JR
京都店
4. [おくの] オクノ 旭川
5. [さくらのひゃっかてん] さくら野百貨店 仙台
6. [さつまや] さつま屋 鹿児島
7. [すたんす] スタンス 米子
8. [そごうおおみやてん] そごう 大宮店
9. [そごうこうべてん] そごう 神戸店
10. [そごうちばてん] そごう 千葉店
11. [そごうよこはまてん] そごう 横浜店
12. [だいやもんどしてぃあるる] ダイアモンドシ
ティアルル 橿原
13. [にゅーず] ニューズ 熊本
* If there is another アペックス, e.g. アペックス広島, you
have to use [あぺっくすふくやま].
From this description on manyogana, I'm thinking they're saying
that collation of the glyph 伊 should be based on its katakana
adaptation イ which makes sense:
http://en.wikipedia.org/wiki/Manyogana
I'm not an expert of Japanese language and literature. But as far as
modern Japanese is concerned, I think it it inappropriate to
associate the pronunciation of a kanji (Chinese letter and pseudo-
Chinese letter used in Japanese) to a man'yogana. 伊勢 is a common
proper name and pronounced いせ.
3. Is the solution to first convert the manyogana characters to
katakana and then do the msort?
Yes.
If so does anyone know of a Perl module to do this or a nice
reference that I could use more programmatically than the image on
the link above?
I don't know and I'm afraid there's not such a module. To give a
pronunciation to all common kanji words would require a large
dictionary...
4. Can anyone think of any other glyphs or classes of Japanese
glyphs similar to manyogana that I should be worried about?
Romaji -- JR in "JR京都店" in your example.
Kino
☯