perl-i18n

ordering Japanese

2006-05-03 16:13:11
Hi,

I am using Sadhiro Tomoyuki's Lingua::JA::Sort::JIS module to sort Japanese
names of stores. I have come close to achieving the order my client has
asked for but am having a little difficulty matching their request exactly.
The problem seems to be collating kana glyphs with manyogana glyphs. (Please
excuse me if I am misusing any terms - this is my first introduction to
Japanese.)

Here is an example of 13 store names ordered with
Lingua::JA::Sort::JIS::msort:

1. 伊\xA8焉扤扤丹 JR京都店
2. アペックス 福山
3. アミュプラザ 鹿児島
4. オクノ 旭\xA8焉扤扤
5. さくら野百貨店 \xA8焉扤扤台
6. さつま屋 鹿児島
7. スタンス 米子
8. そごう 触焉扤扤妖\xB9
9. そごう \xA8焉扤扤葉店
10. そごう 大宮店
11. そごう 横浜店
12. エ焉扤扤ぅ▲皀鵐疋轡謄\x{2197}▲襯襦ヽ犖\xB6
13. ニューズ 熊本

My client tells me that entry 1 should actually come after the 3rd entry and
before the fourth. From this description on manyogana, I'm thinking they're
saying that collation of the glyph 伊 should be based on its katakana
adaptation イ which makes sense:

http://en.wikipedia.org/wiki/Manyogana

Note I'm basing many of my statements on staring at and comparing these
glyphs online and so I might be far off.

So my questions are:

1. Is my client correct in their ordering?
2. I believe I've tried all the combinations of collation levels and kanji
classes in the Lingua::JA::Sort::JIS jcmp function but have not achieved the
desired ordering. Have I perhaps missed the correct combination?
3. Is the solution to first convert the manyogana characters to katakana and
then do the msort? If so does anyone know of a Perl module to do this or a
nice reference that I could use more programmatically than the image on the
link above?
4. Can anyone think of any other glyphs or classes of Japanese glyphs
similar to manyogana that I should be worried about?

Thanks for any help you can give me!

Best,
Mike

P.S. The attachment is this same email but as a UTF-8 text file.

Attachment: question.utf8
Description: Binary data

<Prev in Thread] Current Thread [Next in Thread>