perl-unicode

Re: UCM file and combining character sequences

2003-09-23 12:30:05
Sadahiro Tomoyuki <bqw10602(_at_)nifty(_dot_)com> writes:
Are the Unicode character sequences in [1] normalized?
Can you explain what the diacritics mean I assume '`^ etc. are tone marks?
What do the macron and dot and dots-below signify?

Apparently POJ system uses ten vowels
(a, e, i, m, ng, o, o dot above, u, u diaeresis below) 

Wearing my speech-synthesis hat for a change, I would call m and ng 
nasals rather than vowels but distinction is a fine one.
If anyone here knows what these would be in IPA phonetics 
please let me know off-list.

The choice of "o dot above" is asking for trouble when composing 
glyphs, and presumably is why "diaeresis below" was used for the 
u variant rather than mainstream latin-1 one with it above.

and
five tone marks (acute, grave, circumflex, macron, vertical bar).

However, <dot above> (U+0307) and <acute> (U+0301) has the same
combining class (230: above), <o + acute + dot above> is
not canonically equivalent to <o + dot above + acute>.
If <o dot above> is a vowel and acute is a tone mark, their
combination <LATIN SMALL LETTER O WITH DOT ABOVE AND ACUTE>
should be encoded as <o + dot above + acute>, I think.
Similarly <o + dot above + circumflex>, <o + dot above + grave>,
and <o + dot above + macron>.

SADAHIRO Tomoyuki

<Prev in Thread] Current Thread [Next in Thread>