SADAHIRO Tomoyuki wrote:
Could you add additional normalization for Korean Hangul Jamos
as outlined at http://jshin.net/i18n/korean/jamocomp.html ?
......
In summary, could you make your normalization package
offer a way to specify 'tailoring' (or some kind of
optional normalization)?
......
I looked up Jamo cluster compositions/decompositions a bit,
but they seem not to be conforming with the algorithm of UAX #15.
( http://www.unicode.org/unicode/reports/tr15/ )
I can't thank you enough for taking a look at it and coming up
with problems there. Your reply got me to go through the whole list
and found the cause of the problem. One was an error in my script
and the other was that I didn't realize that Unicode 2.0.14 data file
has comp/decomp. mapping differrent from what I thought it
has for vowels. Anyway, could you take a look at
it again? I think you only have to take entries marked with 'decomposition
mapping' (in three groups with headering 'encoded').
All of the Jamo decomposition mapping in Unicode 2.0.14
are marked with <compat>, so they are not intended to be composed,
only for the compatibility decomposition.
I tried to gain "canonical" decomposition mapping (please see below)
as if they were canonical, but failed in those of U+1181 and U+118C,
due to lack of a composite for <U+1169, U+1167> and for <U+116E, U+1167>.
These two jamo cannot be composed without modification of the algorithm.
P.S. To try the following additional mapping, add them
into lib/unicore/Decomposition.pl
(rebuilding is required for XS. --- cf. ext/Unicode/Normalize/README)
##START (This is not a patch)
1101 1100 1100
1104 1103 1103
1108 1107 1107
110A 1109 1109
110D 110C 110C
1113 1102 1100
1114 1102 1102
1115 1102 1103
1116 1102 1107
1117 1103 1100
1118 1105 1102
1119 1105 1105
111A 1105 1112
111B 1105 110B
111C 1106 1107
111D 1106 110B
111E 1107 1100
111F 1107 1102
1120 1107 1103
1121 1107 1109
1122 1121 1100
1123 1121 1103
1124 1121 1107
1125 1121 1109
1126 1121 110C
1127 1107 110C
1128 1107 110E
1129 1107 1110
112A 1107 1111
112B 1107 110B
112C 1108 110B
112D 1109 1100
112E 1109 1102
112F 1109 1103
1130 1109 1105
1131 1109 1106
1132 1109 1107
1133 1132 1100
1134 110A 1109
1135 1109 110B
1136 1109 110C
1137 1109 110E
1138 1109 110F
1139 1109 1110
113A 1109 1111
113B 1109 1112
113D 113C 113C
113F 113E 113E
1141 110B 1100
1142 110B 1103
1143 110B 1106
1144 110B 1107
1145 110B 1109
1146 110B 1140
1147 110B 110B
1148 110B 110C
1149 110B 110E
114A 110B 1110
114B 110B 1111
114D 110C 110B
114F 114E 114E
1151 1150 1150
1152 110E 110F
1153 110E 1112
1156 1111 1107
1157 1111 110B
1158 1112 1112
1162 1161 1175
1164 1163 1175
1166 1165 1175
1168 1167 1175
116A 1169 1161
116B 116A 1175
116C 1169 1175
116F 116E 1165
1170 116F 1175
1171 116E 1175
1174 1173 1175
1176 1161 1169
1177 1161 116E
1178 1163 1169
1179 1163 116D
117A 1165 1169
117B 1165 116E
117C 1165 1173
117D 1167 1169
117E 1167 116E
117F 1169 1165
1180 117F 1175
1181 <compat> 1169 1167 1175
1182 1169 1169
1183 1169 116E
1184 116D 1163
1185 1184 1175
1186 116D 1167
1187 116D 1169
1188 116D 1175
1189 116E 1161
118A 1189 1175
118B 116F 1173
118C <compat> 116E 1167 1175
118D 116E 116E
118E 1172 1161
118F 1172 1165
1190 118F 1175
1191 1172 1167
1192 1191 1175
1193 1172 116E
1194 1172 1175
1195 1173 116E
1196 1173 1173
1197 1174 116E
1198 1175 1161
1199 1175 1163
119A 1175 1169
119B 1175 116E
119C 1175 1173
119D 1175 119E
119F 119E 1165
11A0 119E 116E
11A1 119E 1175
11A2 119E 119E
11A9 11A8 11A8
11AA 11A8 11BA
11AC 11AB 11BD
11AD 11AB 11C2
11B0 11AF 11A8
11B1 11AF 11B7
11B2 11AF 11B8
11B3 11AF 11BA
11B4 11AF 11C0
11B5 11AF 11C1
11B6 11AF 11C2
11B9 11B8 11BA
11BB 11BA 11BA
11C3 11A8 11AF
11C4 11AA 11A8
11C5 11AB 11A8
11C6 11AB 11AE
11C7 11AB 11BA
11C8 11AB 11EB
11C9 11AB 11C0
11CA 11AE 11A8
11CB 11AE 11AF
11CC 11B0 11BA
11CD 11AF 11AB
11CE 11AF 11AE
11CF 11CE 11C2
11D0 11AF 11AF
11D1 11B1 11A8
11D2 11B1 11BA
11D3 11B2 11BA
11D4 11B2 11C2
11D5 11B2 11BC
11D6 11B3 11BA
11D7 11AF 11EB
11D8 11AF 11BF
11D9 11AF 11F9
11DA 11B7 11A8
11DB 11B7 11AF
11DC 11B7 11B8
11DD 11B7 11BA
11DE 11DD 11BA
11DF 11B7 11EB
11E0 11B7 11BE
11E1 11B7 11C2
11E2 11B7 11BC
11E3 11B8 11AF
11E4 11B8 11C1
11E5 11B8 11C2
11E6 11B8 11BC
11E7 11BA 11A8
11E8 11BA 11AE
11E9 11BA 11AF
11EA 11BA 11B8
11EC 11BC 11A8
11ED 11EC 11A8
11EE 11BC 11BC
11EF 11BC 11BF
11F1 11F0 11BA
11F2 11F0 11EB
11F3 11C1 11B8
11F4 11C1 11BC
11F5 11C2 11AB
11F6 11C2 11AF
11F7 11C2 11B7
11F8 11C2 11B8
##END
SADAHIRO Tomoyuki