Re: removing accents

Le 3 janv. 04, à 15:49, Jarkko Hietaniemi a écrit :

I'm afraid, the process of taking NFD followed by removing \pMcharacters(remove_accent() as below) would remove marks other than accents toomuch.
Say, it replaces '≠' (U+2260, <NOT EQUAL TO>) with '=' (<EQUALS SIGN>)
since a mathematic "negation slash" is encoded by U+0338
 <COMBINING LONG SOLIDUS OVERLAY> which is to be removed.
Also, although they are not accents, it's unclear (and quitelanguage-dependent)
what should be done with ligatures.


Thanks to you both for your replies. I did some more research
and found that even removing accents is locale dependant.
I reverted back to my carefully crafted tr()s... Incidentally
much faster than the Unicode::Normalize / remove \pM approach.

--
Eric Cholet

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re: Keeping byte-wise processing as an option, Guido Flohr

Next by Date:

Re: perlunicode comment - when Unicode does not happen, Ed Batutis

Previous by Thread:

Re: removing accents, Jarkko Hietaniemi

Next by Thread:

Keeping byte-wise processing as an option, Martin Duerst

Indexes:

[Date] [Thread] [Top] [All Lists]