Re: [ANNOUNCE] Unicode::Normalize 0.21 and ::Collate 0.24 released


On Fri, 11 Apr 2003 02:04:37 +0900
SADAHIRO Tomoyuki <bqw10602(_at_)nifty(_dot_)com> wrote:

All of the Jamo decomposition mapping in Unicode 2.0.14
are marked with <compat>, so they are not intended to be composed,
only for the compatibility decomposition.

I tried to gain "canonical" decomposition mapping (please see below)
as if they were canonical, but failed in those of U+1181 and U+118C,
due to lack of a composite for <U+1169, U+1167> and for <U+116E, U+1167>.

These two jamo cannot be composed without modification of the algorithm.


The old compatibility decomposition mappings of "cluster Jamo"
made Hangul Syllables an inconsistent result in NFKC (that was
NCC in old times).
I.e. Hangul syllables composed of only "simple Jamo" are maintained
as a syllable, but others are not.
cf. http://www.unicode.org/unicode/reports/tr15/tr15-9.html

I prepare "canonical" decomposition mappings including
Hangul Syllables.  With them, all the Hangul Syllables will
be decomposed into "simple Jamo" sequences in NFD/NFKD,
and maintained as the same Syllable in NFC/NFKC.
(but U+1181 and U+118C are still unsolved.)

I attach them as a patch against lib/unicore/Decomposition.pl
in Perl 5.8.0 or a perl-current, named decomp.tar.gz.
(sorry, the size is quite big) 

The XS edition of Unicode::Normalize uses
default decomposition mappings for Hangul Syllables
even if they are explicitly defined in Decomposition.pl.
I don't think it's a bug as long as tailoring is unsupported, though.

(The pure Perl edition of Unicode::Normalize (in CPAN)
does not have the above problem.)


The following patch will make the XS edition
avoid using the default Hangul decomposition/composition.
I don't recommend this for general purposes.

diff -urN Unicode-Normalize-0.21~/Normalize.xs.XS 
Unicode-Normalize-0.21/Normalize.xs.XS
--- Unicode-Normalize-0.21~/Normalize.xs.XS     Thu Apr 03 23:54:02 2003
+++ Unicode-Normalize-0.21/Normalize.xs.XS      Sat Apr 12 19:53:30 2003
@@ -43,12 +43,12 @@
 #define Hangul_TFinal 0x11C2
 #define Hangul_TCount     28
 
-#define Hangul_IsS(u)  ((Hangul_SBase <= (u)) && ((u) <= Hangul_SFinal))
-#define Hangul_IsN(u)  (((u) - Hangul_SBase) % Hangul_TCount == 0)
-#define Hangul_IsLV(u) (Hangul_IsS(u) && Hangul_IsN(u))
-#define Hangul_IsL(u)  ((Hangul_LBase <= (u)) && ((u) <= Hangul_LFinal))
-#define Hangul_IsV(u)  ((Hangul_VBase <= (u)) && ((u) <= Hangul_VFinal))
-#define Hangul_IsT(u)  ((Hangul_TBase  < (u)) && ((u) <= Hangul_TFinal))
+#define Hangul_IsS(u)  FALSE
+#define Hangul_IsN(u)  FALSE
+#define Hangul_IsLV(u) FALSE
+#define Hangul_IsL(u)  FALSE
+#define Hangul_IsV(u)  FALSE
+#define Hangul_IsT(u)  FALSE
 /* HANGUL_H */
 
 /* this is used for canonical ordering of combining characters (c.c.). */
# END OF PATCH

SADAHIRO Tomoyuki

decomp.tar.gz
Description: Binary data