perl-unicode

Re: Caseless and accentless string comparisons

2003-05-12 05:30:05



On Mon, 12 May 2003 Martin_Hosken(_at_)sil(_dot_)org wrote:

says that you should do that in some cases, but doesn't say how.  I
have poked around a bit and nothing springs out at me.  Is there a

I would use Unicode::Normalize to convert the string to NFC and then delete
the combining characters:

$str = NFC($input);
$str =~ s/\pM//og;

  You meant NFD, didn't you?  BTW, the proposed update of UTS #10
( http://www.unicode.org/reports/tr10/tr10-10.html) may be of interest
as well. BTW, this is yet a draft and as such needs some refining (for
instance, Hangul Jamo handling is not satisfactory.). Is there any Perl
module that implements Unicode collation as described in UTS #10 or the
collation algorithm specified in ISO 14651-to-be (as it stands) ?

  Jungshik