On Mon, 23 Jul 2001 13:43:30 -0500
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> wrote:
Darn. Got me there, I am the one always warning people about the fact
that Unicode is not 16 bit anymore :-)
I think we should solve this somehow differently, different, I don't
want to introduce a new huge-ish file (that is just a differently sorted
version of an existing file) to just to do the binary search.
I think the searching method doesn't matter, :-)
so long as it is appropriate and also able to handle
CJK Unified Ideographs and Hangul syllables.
BTW, Hangul syllables must be decomposed canonically, mustn't it?
cf. DerivedDecompositionType-3.1.0.txt in Unicode 3.1
30FE ; canonical # Lm KATAKANA VOICED ITERATION MARK
AC00..D7A3 ; canonical # Lo [11172] HANGUL SYLLABLE GA
..HANGUL SYLLABLE HIH
F900..FA0D ; canonical # Lo [270] CJK COMPATIBILITY IDEOGRAPH-F900
..CJK COMPATIBILITY IDEOGRAPH-FA0D
but they are not included in lib/unicode/IsDecoCanon.pl.
and why does lib/unicode/IsCn.pl comprise no characters?
(see DerivedGeneralCategory-3.1.0.txt)
For example, like this?
# 0x0590 is in the Hebrew block but unused.
-ok($charinfo->{category}, undef);
+ok($charinfo->{category}, 'Cn');
regards,
SADAHIRO Tomoyuki
E-mail: bqw10602(_at_)nifty(_dot_)com