Gisle Aas <gisle(_at_)ActiveState(_dot_)com> writes:
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> writes:
Please take a look at the (very rough) first draft of Encode, an extension
for character encoding conversions for Perl 5:
http://www.iki.fi/jhi/Encode.tgz
Download, plop it into the Perl 5.7 source directory, unpack,
re-Configure, rebuild. (Or, if you have a Perl 5.7 in your path,
cd to ext/Encode, perl Makefile.PL, make).
I did not really understand the interface. It seems like you expose
the fact that perl (currently) use utf8 internally too much.
I would like to see these convert perl strings to bytes:
to_utf7
to_utf8
And these convert a sequence of bytes to perl strings:
from_utf8
from_utf8_strict # croak on out-of-range UTF8, over-long sequences, etc.
from_utf16_be
from_utf32_be
You seem to want to define these function the opposite way. Perhaps
the names are just too confusing.
I can see why either way round makes a kind of sense.
I think the 'from_' names are more confusing than the 'to_' names.
The snag with either is the "other" side is implcit.
My stab at names would be:
utf8bytes_to_chars()
chars_to_utf8bytes();
With variants for the other utf* if necessary.
The important thing to remember is that internaly perl has sequence of
characters. And that in perl-5.6+ characters can be bigger than 8 bits.
The fact that we internally represent the characters with UTF8 encodings
should be irrelevant to the API. The only place it should matter is that
some of the "give me the string like this" functions will not
have to _do_ very much.
--
Nick Ing-Simmons