On Tue, Sep 12, 2000 at 12:24:50AM +0200, Gisle Aas wrote:
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> writes:
Please take a look at the (very rough) first draft of Encode, an extension
for character encoding conversions for Perl 5:
http://www.iki.fi/jhi/Encode.tgz
Download, plop it into the Perl 5.7 source directory, unpack,
re-Configure, rebuild. (Or, if you have a Perl 5.7 in your path,
cd to ext/Encode, perl Makefile.PL, make).
I did not really understand the interface. It seems like you expose
the fact that perl (currently) use utf8 internally too much.
Before we have the character mapping tables we don't have a choice --
and after that it shouldn't matter anyway since we should have
from_to(). Then people will never see the underlying utf8ness.
(Unless, of course they say from_to('latin1', 'utf8'), but that's
transparent and orthogonal.)
I would like to see these convert perl strings to bytes:
to_utf7
to_utf8
...
And these convert a sequence of bytes to perl strings:
from_utf8
You seem to want to define these function the opposite way. Perhaps
*Now* I understand why I couldn't ever figure out how to use Unicode::Map8 :-)
The is_utf8() also seem wrong to me. I believe that the SV invariant
should be that a string marked with the UTF8 flag should not contain
illegal UTF8 sequences. Why is it not so?
I'm being paranoid. Keeps me alive.
Regards,
Gisle
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen