perl-i18n

Re: Stripping out Unicode combining characters (diacritics)

2008-05-06 04:56:09
On Mon, May 5, 2008 at 8:26 PM, Doran, Michael D <doran(_at_)uta(_dot_)edu> 
wrote:
[snip]

 I'm pulling my hair out on this... so any help would be appreciated.  If 
there's any other info I can provide, let me know.


You'll want to transform the text to NFD format (nominally, base
characters plus combining marks) instead of NFC (precombined
characters) using Unicode::Normalize:

 use Unicode::Normalize;

 my $text = NFD($original);
 $text =~ s/\pM+//go;

Hope that helps.

-- 
Mike Rylander
 | VP, Research and Design
 | Equinox Software, Inc. / The Evergreen Experts
 | phone: 1-877-OPEN-ILS (673-6457)
 | email: miker(_at_)esilibrary(_dot_)com
 | web: http://www.esilibrary.com