Bjoern Hoehrmann <derhoermi(_at_)gmx(_dot_)net> writes:
Now that we have this problem, introducing more places where one needs
to carefully check the documentation what is considered UTF-8 does not
seem like the best option, having decode_utf8() and decode(utf8=>...)
mean some- thing different is likely going to cause confusion. Maybe
this could go the other way round, i.e. introduce a new encoding
"UTF-8-Strict" or something.
This is certainly more backwards compatible, but do we really want
perl applications to exchange illegal UTF-8 by default?
Hmm, maybe I should ask why you proposed to keep the old behavior of
encode_utf8 in the first place? The change would make more sense to
me if both encode("UTF-8" => ...) and encode_utf8(...) were changed.
This was sort of discussed way back.
Perl uses 'utf8' (lower case no hyphen) at least partly to allow
UTF-8 (upper case hyphen) to be real one.
So IMHO encode_utf8() can/should stay as hacky but efficent to/from
perl's internal form. encode('UTF-8',...) can be the "real" one.
Which leaves 'utf-8' 'uTf_8' and other "equivalents" undefined ;-)