perl-unicode

Re: [ANNOUNCE] Unicode::Normalize 0.21 and ::Collate 0.24 released

2003-04-08 19:30:05
SADAHIRO Tomoyuki wrote:

Could you add additional normalization for Korean Hangul Jamos
as outlined at http://jshin.net/i18n/korean/jamocomp.html ?
......

In summary, could you make your normalization package
offer a way to specify 'tailoring' (or some kind of
optional normalization)?

I looked up Jamo cluster compositions/decompositions a bit,
but they seem not to be conforming with the algorithm of UAX #15.
( http://www.unicode.org/unicode/reports/tr15/ )
I can't thank you enough for taking a look at it and coming up
with problems there.  Your reply got me to go through the whole list
and found the cause of the problem. One was an error in my script
and the other was that I didn't realize that Unicode 2.0.14 data file
has comp/decomp. mapping differrent from what I thought it
has for vowels. Anyway, could you take a look at
it again? I think you only have to take entries marked with 'decomposition
mapping' (in three groups with headering 'encoded').


and the decomposition mapping of

O-E (U+1180) must be O-EO (U+117F) + I (U+1175).

That is, O-E => O + E => O + (EO + I) => (O + EO) + I => O-EO + I.
I'm not following you here. I think O-E (U+1180) should
be fully decomposed into O (U+1169) + EO (U+1165) + I (U+1175).


PS. IMO, so that any function would be integrated
in Unicode::Normalize, its feature should be
specified, mentioned, or suggested in UAX #15.
You're right. The trouble is Korean standard body kinda messed up Hangul
encodings in Unicode by a series of not-so-wise requests beginning with insisting
on enumerating all 11,172 syllables in precomposed forms and
culiminating with their request to remove decomposition
of complex/cluster Jamos into basic/simple Jamo sequences,
which we will never be able to mend because the Unicode
comp/decomp. was frozen.  Nonetheless, we have to
abide by it and that's why I'm looking for a way around it
and was glad to hear that UTC might introduce tailoring
mechanism for NFC/NFD/NFKC/NFKD.  If it's not
clear, what I asked you for is to prepare for this kind
of tailoring support.  Even if the UTC turns down the idea,
there will be certainly a request to tailor them.
Needless to say, this feature has to be offered as an option
(not as the default).

PS2. Jamo Composition (&composeJamo) may be easily implemented
by something like the following codelet.

 Thank you for the codelet. It'll be useful.

 Jungshik