perl-unicode

Re: [perl #22111] perl::Encode doesn't handle UTF-8 NFD strings

2003-05-06 06:30:06
On 2003-05-06 at 21:21 +0900 Dan Kogai sent off:
If perl is an application like, say, a word processor, I would agree that perl and Encode should handle Normalization internally and transparently so "canonically-equivalent" strings compare as equal. But perl is a PROGRAMMING LANGUAGE so you have to be able to treat different (though may be equivalent Unicode-wise) things different by default. Otherwise you can't even implement new normalization in perl. So I do not consider this as a bug since perl 5.8 comes with both Encode and Unicode::Normalize.

this gives a chance to workaround this bug (yes, I think it is).


If you want to do it transparently, you can always use Encode::Encoding to implement your own. Here is an example.

well, see: from_to claims to convert from encoding1 to encoding2. encoding1 in this case is utf-8. Also the non-composed UTF-8 is perfectly valid UTF-8 and there's absolutely no reason, why from_to($string,"utf8","latin1") should not work just because I used the NFD form and not the NFC form. Your example is just a way to work around this bug but from_to should not care if the initial string is NFC or NFD.

Bjoern