perl-unicode

Re: [perl #22111] perl::Encode doesn't handle UTF-8 NFD strings

2003-05-06 08:30:05

On Tue, 6 May 2003 14:46:06 +0200
Bjoern Jacke <debianbugs(_at_)j3e(_dot_)de> wrote:

(snip)
If you want to do it transparently, you can always use Encode::Encoding 
to implement your own.  Here is an example.

well, see: from_to claims to convert from encoding1 to encoding2. 
encoding1 in this case is utf-8. Also the non-composed UTF-8 is 
perfectly valid UTF-8 and there's absolutely no reason, why 
from_to($string,"utf8","latin1") should not work just because I used 
the NFD form and not the NFC form. Your example is just a way to work 
around this bug but from_to should not care if the initial string is 
NFC or NFD.

Bjoern

You must suffer some information loss
when you convert Unicode to a legacy (non-Unicode)
encoding whose repertoire is a subset of Unicode.
Legacy encodings, of course, include latin1.

"Normalizability" (normalization behavior) of a legacy
encoding is defined in UAX #15.

http://www.unicode.org/reports/tr15/#Legacy_Encodings

According to this annex, Latin1 is unnormalizable except in NFC.
So latin1 is not appropriate to NFD, NFKC, and NFKD.
Actually, a legacy encoding may be unnormalizable in all the
normalization forms; e.g. encodings specified by JIS X 0208.

In a sense, the legacy encoding is just a *legacy*;
i.e., that would not be reproduced any more.

SADAHIRO Tomoyuki