perl-unicode

Combining characters in front of base characters after normalization

2002-02-28 07:30:25
I understand that Perl 5.8 will contain modules for normalizing Unicode
strings, for example into Normalization Form C.

If this is the case, here a hopefully simple to implement and I think
very useful suggestion:

Would it be possible to add to the normalization function that turns
everything into combining characters a "reverse order" option that
causes the combining characters to precede the base character in stead
of to follow it?

In Unicode, combining characters follow the base characters. In many
other environments, it is the other way round (TeX: \" + a -> ä). So if
I want to write a Unicode to TeX converter, I will first bring Unicode
to Normalization Form C, then I have to move the combining characters in
front of the base character in reverse order, and finally I can
substitute them with the appropriate TeX sequence.

It would certainly be far more convenient and efficient, if the "move
the combining characters in front of the base character" step could
already be done in the normalization routine, because it has already
split up the string in memory appropriately.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

<Prev in Thread] Current Thread [Next in Thread>