perl-unicode

Re: [perl #22111] perl::Encode doesn't handle UTF-8 NFD strings

2003-05-07 01:30:08
Bjoern Jacke <debianbugs(_at_)j3e(_dot_)de> writes:

If you want to do it transparently, you can always use Encode::Encoding 
to implement your own.  Here is an example.

well, see: from_to claims to convert from encoding1 to encoding2. 
encoding1 in this case is utf-8. Also the non-composed UTF-8 is 
perfectly valid UTF-8 and there's absolutely no reason, why 
from_to($string,"utf8","latin1") should not work just because I used 
the NFD form and not the NFC form. Your example is just a way to work 
around this bug but from_to should not care if the initial string is 
NFC or NFD.

Most of perl's encodings are octet-sequence/octet-sequence converters.
Which are easy to code, compact reasonably fast and ... dumb!
I also probably gave more thought to decode (from some form to Unicode)
rather than encode step - for decode producing NFC is natural.

Perhaps it makes sense to add a tweak to encode side so that if no encoding 
exists for the code point and code-point sequence is not normalize it tries
to normalize?







Bjoern
-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/