perl-unicode

Re: Make Encode.pm support the real UTF-8

2004-12-06 10:30:10
Bjoern Hoehrmann <derhoermi(_at_)gmx(_dot_)net> writes:

Now that we have this problem, introducing more places where one needs
to carefully check the documentation what is considered UTF-8 does not
seem like the best option, having decode_utf8() and decode(utf8=>...)
mean some- thing different is likely going to cause confusion. Maybe
this could go the other way round, i.e. introduce a new encoding
"UTF-8-Strict" or something.

This is certainly more backwards compatible, but do we really want
perl applications to exchange illegal UTF-8 by default?

Hmm, maybe I should ask why you proposed to keep the old behavior of
encode_utf8 in the first place? The change would make more sense to
me if both encode("UTF-8" => ...) and encode_utf8(...) were changed.

This was sort of discussed way back.
Perl uses 'utf8' (lower case no hyphen) at least partly to allow
UTF-8 (upper case hyphen) to be real one.

So IMHO encode_utf8() can/should stay as hacky but efficent to/from 
perl's internal form. encode('UTF-8',...) can be the "real" one.

Which leaves 'utf-8' 'uTf_8' and other "equivalents" undefined ;-)