perl-unicode

Re: removing accents

2004-01-03 08:30:04
I'm afraid, the process of taking NFD followed by removing \pM characters (remove_accent() as below) would remove marks other than accents too much.

Say, it replaces '≠' (U+2260, <NOT EQUAL TO>) with '=' (<EQUALS SIGN>)
since a mathematic "negation slash" is encoded by U+0338
 <COMBINING LONG SOLIDUS OVERLAY> which is to be removed.

Also, although they are not accents, it's unclear (and quite language-dependent)
what should be done with ligatures.

--
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> http://www.iki.fi/jhi/ "There is this special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen


<Prev in Thread] Current Thread [Next in Thread>