On Mon, Sep 11, 2000 at 04:52:33PM -0500, Martin_Hosken(_at_)sil(_dot_)org
wrote:
There isn't anything to test this with (I did say 'very rough'). Please
read
Encode.pm. Mainly I'm interested hearing comments whether this is a good
interface, something that could be used to replace Unicode::Map8 (lots of
table-driven conversions, for 8-bit legacy character sets), and when we
feel up
to it, Unicode::Map (lot of algorithmic conversions, for Eastern Asian
encodings). AFAIU, the current user interface of ::Map8 isn't what we
want?
Two comments:
1. Is there any chance of a null mapping to convert a string containing
UTF8 but not marked to one so marked, and vice versa?
"Define 'containing UTF8'. This string contains UTF8."
I purposefully left that one out. I think that would be is too
low-level and dangerous. The moment someone turns on the UTF8 flag on
data that isn't, we have invalid data, something that expects UTF8 the
flag to mean anything will be misled. Red core dumps and vert data
corruption rampant in field argent, that's the coat of arms of an
unhappy coder.
(If someone really wants to diddle in bit-level UTF8, he/she/it already can.
Witness the following one-liners for Latin-1 to UTF-8 and vice versa:
s/([\x80-\xFF])/chr(0xC0|ord($1)>>6).chr(0x80|ord($1)&0x3F)/eg;
s/([\xC0-\xDF])([\x80-\xBF])/chr(ord($1)<<6&0xC0|ord($2)&0x3F)/eg;
)
Later, when we've got IO disciplines, we should be able to mark
an input handle to in utf8.
A propos, I think in-place variants of to_utf8(0 and from_utf8()
might be in order. utf8_on() and utf8_off(), possibly. Maybe move
some of the functions to utf8.c, sv_cvtpv_to_utf8(), and sv_cvtpv_from_utf8(),
newSVpv_utf8(), newSVsv_to_utf8(), newSVsv_from_utf8(), maybe.
--
$jhi++; # http://www.iki.fi/jhi/
# There is this special biologist word we use for 'stable'.
# It is 'dead'. -- Jack Cohen