Hello, everyone.
This is a proposal of a new module.
Name : ???
SYNOPSIS
This module provides the following functions
to handle Hangul Syllables (and Jamo) in unicode.
decomposeHangul
composeHangul
getHangulName
parseHangulName
These functions must be useful for implementing
many things concerning unicode,
including charnames.pm, UnicodeCD.pm, ...,
and Normalization and Collation modules in future.
DESCRIPTION
$decomposed_string = decomposeHangul($u_integer);
@u_integers = decomposeHangul($u_integer);
ex.)
decomposeHangul(0xAC00) # a CV syllable
returns "\x{1100}\x{1161}"
or (0x1100, 0x1161);
decomposeHangul(0xAE00) # a CVC syllable
returns "\x{1100}\x{1173}\x{11AF}"
or (0x1100, 0x1173, 0x11AF);
decomposeHangul(0x0041) # outside of Hangul Syllables
returns empty string or empty list.
$hangul_composed_string = composeHangul($src_string);
ex.)
composeHangul("Hangul \x{1100}\x{1161}\x{1100}\x{1173}\x{11AF}")
returns "Hangul \x{AC00}\x{AE00}";
Any characters other than Hangul Jamo and Hangul Syllables
are unaffected.
$name = getHangulName($u_integer);
ex.)
getHangulName(0xAC00) # a CV syllable
returns "HANGUL SYLLABLE GA";
getHangulName(0xAE00) # a CVC syllable
returns "HANGUL SYLLABLE GEUL";
getHangulName(0x0041) # outside of Hangul Syllables
returns undef.
$u_integer = parseHangulName($name);
ex.)
parseHangulName("HANGUL SYLLABLE GA")
or getHangulName("GA") returns 0xAC00;
parseHangulName("HANGUL SYLLABLE GEUL")
or getHangulName("GEUL") returns 0xAE00;
parse("LATIN SMALL LETTER A") returns undef.
Caveat:
parseHangulName("A") returns 0xC544
as parseHangulName("HANGUL SYLLABLE A") does.
but parseHangulName("G") returns undef
because of the absence of "HANGUL SYLLABLE G".
IMPLEMENTATION
cf. Annex 10: Hangul,
in Unicode Normalization Forms (UTR #15)
http://www.unicode.org/unicode/reports/tr15
Algorithms for decomposeHangul, composeHangul,
and getHangulName have been given in the UTR #15.
Algorithm for parseHangulName is easy;
The regex
/^
(?:HANGUL\ SYLLABLE\ )?
([^AEIOUWY]*)([AEIOUWY]+)([^AEIOUWY]*)
$/x
splits a syllable name into the corresponding
short jamo names in the order of initial, medial, final.
(BN: initial and final jamo names may be zero-length,
cf. "HANGUL SYLLABLE WA")
Then, if *all* the short jamo names are legal,
the syllable name is legal.
CAVEAT
This module won't handle *all* about hangul,
but only things
that are not included in Unicode.txt, NamesList.txt, etc.
and must be derived from the argument by algorithm.
I think passing a character outside Hangul syllable in
shouldn't be carped or croaked,
since it supposes the return value would be *always* checked.
regards,
SADAHIRO Tomoyuki
E-mail: bqw10602(_at_)nifty(_dot_)com