Hank Tt <news2002nov(_at_)lomaji(_dot_)com> writes:
I'm trying to make a UCM file to feed to enc2xs. The legacy encoding for
Taiwanese romanization *must* have its code points mapped to Unicode
character sequences, for the simple reason that the UCS lacks the
corresponding precomposed characters (and is unlikely to have them in the
future, as they are composable using existing characters from the Latin
script and the Diacritical Combining Marks blocks). (See  for script
Are the Unicode character sequences in  normalized?
Can you explain what the diacritics mean I assume '`^ etc. are tone marks?
What do the macron and dot and dots-below signify?
Apparently POJ system uses ten vowels
(a, e, i, m, ng, o, o dot above, u, u diaeresis below) and
five tone marks (acute, grave, circumflex, macron, vertical bar).
However, <dot above> (U+0307) and <acute> (U+0301) has the same
combining class (230: above), <o + acute + dot above> is
not canonically equivalent to <o + dot above + acute>.
If <o dot above> is a vowel and acute is a tone mark, their
combination <LATIN SMALL LETTER O WITH DOT ABOVE AND ACUTE>
should be encoded as <o + dot above + acute>, I think.
Similarly <o + dot above + circumflex>, <o + dot above + grave>,
and <o + dot above + macron>.