Sadahiro Tomoyuki <bqw10602(_at_)nifty(_dot_)com> writes:
Are the Unicode character sequences in [1] normalized?
Can you explain what the diacritics mean I assume '`^ etc. are tone marks?
What do the macron and dot and dots-below signify?
Apparently POJ system uses ten vowels
(a, e, i, m, ng, o, o dot above, u, u diaeresis below)
Wearing my speech-synthesis hat for a change, I would call m and ng
nasals rather than vowels but distinction is a fine one.
If anyone here knows what these would be in IPA phonetics
please let me know off-list.
The choice of "o dot above" is asking for trouble when composing
glyphs, and presumably is why "diaeresis below" was used for the
u variant rather than mainstream latin-1 one with it above.
and
five tone marks (acute, grave, circumflex, macron, vertical bar).
However, <dot above> (U+0307) and <acute> (U+0301) has the same
combining class (230: above), <o + acute + dot above> is
not canonically equivalent to <o + dot above + acute>.
If <o dot above> is a vowel and acute is a tone mark, their
combination <LATIN SMALL LETTER O WITH DOT ABOVE AND ACUTE>
should be encoded as <o + dot above + acute>, I think.
Similarly <o + dot above + circumflex>, <o + dot above + grave>,
and <o + dot above + macron>.
SADAHIRO Tomoyuki