Re: [perl #22111] perl::Encode doesn't handle UTF-8 NFD strings

Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> writes:


There is no Encode in the above.

NFD("Äpfel") to latin1. If you say it shouldn't it's like saying an 
English translator shouldn't be able to translate American English, 
just because they have a few differnet words than the British folks.


I am sorry but I think you are simply flat out wrong and I do not feel
like arguing about this any more.  Perl works at the level of bytes
and characters, not at the level of character equivalences-- that a
native Latin1 character should be equivalent to a somehow decomposed
Unicode presentation of the same character.  They are not.


For what it is worth Encode works at character level as well.
Some decomposed (NFD) chars are to some extent representable in latin1
in that (for example) � could be 'A' and <U00A8> # DIAERESIS with 
a little laxity perhaps bit it is possible.

If Encode or perl coerced the normalization then these would get lost.
So the current scheme makes easy things easy (possibly with a call
to NFC() if necessary) and hard things possible.




-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/

<Prev in Thread]

Current Thread

[Next in Thread>

Previous by Date:

Re: [perl #22111] perl::Encode doesn't handle UTF-8 NFD strings, Jarkko Hietaniemi

Next by Date:

Re: Caseless and accentless string comparisons, SADAHIRO Tomoyuki

Previous by Thread:

Re: [perl #22111] perl::Encode doesn't handle UTF-8 NFD strings, Jarkko Hietaniemi

Next by Thread:

Re: [perl #22111] perl::Encode doesn't handle UTF-8 NFD strings, SADAHIRO Tomoyuki

Indexes:

[Date] [Thread] [Top] [All Lists]