perl-unicode

Re: [perl #22111] perl::Encode doesn't handle UTF-8 NFD strings

2003-05-13 06:30:06
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> writes:

There is no Encode in the above.

NFD("Äpfel") to latin1. If you say it shouldn't it's like saying an 
English translator shouldn't be able to translate American English, 
just because they have a few differnet words than the British folks.

I am sorry but I think you are simply flat out wrong and I do not feel
like arguing about this any more.  Perl works at the level of bytes
and characters, not at the level of character equivalences-- that a
native Latin1 character should be equivalent to a somehow decomposed
Unicode presentation of the same character.  They are not.

For what it is worth Encode works at character level as well.
Some decomposed (NFD) chars are to some extent representable in latin1
in that (for example) � could be 'A' and <U00A8> # DIAERESIS with 
a little laxity perhaps bit it is possible.

If Encode or perl coerced the normalization then these would get lost.
So the current scheme makes easy things easy (possibly with a call
to NFC() if necessary) and hard things possible.




-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/