As you probably know perl's version of UTF-8 is not the real thing. I
thought I would hack up a patch to support the encoding as defined by
Unicode. That involves rejecting illegal chars (like surrogates,
"\x{FFFF}" and "\x{FDD0}), chars above 0x10FFFF, overlong sequences
and such.
Before I do this I would like to get some feedback on the interface.
My prefered interface would be to make:
encode("UTF-8", $string)
imply the official restricted form and then have
encode("UTF-8-Perl", $string)
be used as the name for Perl's relaxed and extended version of the
encoding. The encode_utf8($string) function would continue to be the
same as encode("UTF-8-Perl", $string).
This implies that encode("UTF-8", $string) can start failing while
previously it could not.
Another approach would be to add a FB_STRICT flag that could be passed
with the CHECK argument. I'm not sure this would make sense for any
encoding besides UTF-8 though.
Other suggestions or comments?
Regards,
Gisle