perl-unicode

Re: [perl #22111] perl::Encode doesn't handle UTF-8 NFD strings

2003-05-13 05:30:29
On Tue, May 13, 2003 at 01:43:40PM +0200, Bjoern Jacke wrote:
On 2003-05-06 at 15:58 +0300 Jarkko Hietaniemi sent off:
well, see: from_to claims to convert from encoding1 to encoding2. 
encoding1 in this case is utf-8. Also the non-composed UTF-8 is 
perfectly valid UTF-8 and there's absolutely no reason, why 
from_to($string,"utf8","latin1") should not work just because I used 
the NFD form and not the NFC form. Your example is just a way to work 

You are assuming the equivalence of (pre)composed characters and
their composed forms.  Perl doesn't do this at any level.

you are not right.

$string = "äpfel";
$string_nfd=NFD($string);
$string_nfc=NFC($string);
if ($string_nfd eq $string_nfc) {
      print "This will be printed!";
}
if (NFD($string_nfd) eq $string_nfc) {
      print "This will *not* be printed!";
}

I am confused.  The above prints nothing (no surprise there since the
bytes 0x61 0xcc 0x88 0x70 0x66 0x65 0x6c are very different from the
bytes 0xc3 0xa4 0x70 0x66 0x65 0x6c).  Are you saying it should test
true in the the first case?  If so, I strongly disagree.

I still say that this is a bug and encode should be able to convert 

There is no Encode in the above.

NFD("Äpfel") to latin1. If you say it shouldn't it's like saying an 
English translator shouldn't be able to translate American English, 
just because they have a few differnet words than the British folks.

I am sorry but I think you are simply flat out wrong and I do not feel
like arguing about this any more.  Perl works at the level of bytes
and characters, not at the level of character equivalences-- that a
native Latin1 character should be equivalent to a somehow decomposed
Unicode presentation of the same character.  They are not.

-- 
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> http://www.iki.fi/jhi/ "There is this 
special
biologist word we use for 'stable'.  It is 'dead'." -- Jack Cohen