Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> writes:
Please take a look at the (very rough) first draft of Encode, an extension
for character encoding conversions for Perl 5:
http://www.iki.fi/jhi/Encode.tgz
Download, plop it into the Perl 5.7 source directory, unpack,
re-Configure, rebuild. (Or, if you have a Perl 5.7 in your path,
cd to ext/Encode, perl Makefile.PL, make).
I did not really understand the interface. It seems like you expose
the fact that perl (currently) use utf8 internally too much.
I would like to see these convert perl strings to bytes:
to_utf7
to_utf8
perl enhanced utf8 (does not restrict range to avoid surrogates
and chars above 10FFFF as well as FFFE, FFFF)
to_utf8_strict
croaks on bad stuff
to_utf16_be
to_utf16_le
to_utf32_be
to_utf32_le
And these convert a sequence of bytes to perl strings:
from_utf8
from_utf8_strict # croak on out-of-range UTF8, over-long sequences, etc.
from_utf16_be
from_utf32_be
You seem to want to define these function the opposite way. Perhaps
the names are just too confusing.
My previous attempt on this used names like encode_utf8() and
decode_utf8(). They also confused a lot of people.
The is_utf8() also seem wrong to me. I believe that the SV invariant
should be that a string marked with the UTF8 flag should not contain
illegal UTF8 sequences. Why is it not so?
Regards,
Gisle