Hangul decomposition and composition

2003-09-12 05:30:04

I've recently updated Lingua::KO::Hangul::Util [1],
and added functions decomposing complex Jamo to simple Jamo
and composing simple Jamo to complex Jamo:
they are named decomposeJamo() and composeJamo(), resp.
(This is just putting new functions in old package.)

In comparison with a tweak against Unicode::Normalize [2],
Jamo composition works differently.

Lingua::KO::Hangul::Util::composeJamo() composes
a sequence of simple jamo having the same Hangul syllable type [3],
i.e., L{n} to L, V{n} to V, and T{n} to T.
This implementation follows K.Kim's proposal. [4]
U+1181 and U+118C are also composed properly.

Composition of U+1169,U+1167,U+1175:
    <jungseong O, jungseong YEO, jungseong I>
  via LKHU => U+1181.
    <jungseong O-YE>
  via U::N => U+1169,U+1168.
    <jungseong O, jungseong YE>

In contrast, tweaked Unicode::Normalize prefers L+V composition
 (L + V => LV) to V+V composition, since the algorithm by UAX#15
composes characters from left to right (in LVV, LV is leftmore than VV).

If an old (non-modern) complex jamo is included,
the result of composition is different as following: 

Composition of U+1100,U+1161,U+1169:
    <choseong KIYEOK, jungseong A, jungseong O>
  via LKHU => U+1100,U+1176.
    <choseong KIYEOK, jungseong A-O>
  via U::N => U+AC00,U+1169.
    <syllable GA, jungseong O>

A similar situation occurs in the case of L+V+T+T
where L+V constitutes a modern syllable:

Composition of U+1100,U+1161,U+11A8,U+11AF:
    <choseong KIYEOK, jungseong A, jongseong KIYEOK, jongseong RIEUL>
  via LKHU => U+1100,U+1161,U+11C3.
    <choseong KIYEOK, jungseong A, jongseong KIYEOK-RIEUL>
  via U::N => U+AC01,U+11AF.
    <syllable GAG, jongseong RIEUL>

[4] (full); (summary)


<Prev in Thread] Current Thread [Next in Thread>
  • Hangul decomposition and composition, SADAHIRO Tomoyuki <=