perl-unicode

Re: [EXPERIMENTAL] 1st draft of Encode

2000-09-11 15:09:56
On Mon, Sep 11, 2000 at 04:52:33PM -0500, Martin_Hosken(_at_)sil(_dot_)org 
wrote:
There isn't anything to test this with (I did say 'very rough'). Please 
read 
Encode.pm.  Mainly I'm interested hearing comments whether this is a good 

interface, something that could be used to replace Unicode::Map8 (lots of 

table-driven conversions, for 8-bit legacy character sets), and when we 
feel up 
to it, Unicode::Map (lot of algorithmic conversions, for Eastern Asian 
encodings).  AFAIU, the current user interface of ::Map8 isn't what we 
want?


Two comments:

1. Is there any chance of a null mapping to convert a string containing 
UTF8 but not marked to one so marked, and vice versa?

"Define 'containing UTF8'.  This string contains UTF8."

I purposefully left that one out.  I think that would be is too
low-level and dangerous.  The moment someone turns on the UTF8 flag on
data that isn't, we have invalid data, something that expects UTF8 the
flag to mean anything will be misled.  Red core dumps and vert data
corruption rampant in field argent, that's the coat of arms of an
unhappy coder.

(If someone really wants to diddle in bit-level UTF8, he/she/it already can.
 Witness the following one-liners for Latin-1 to UTF-8 and vice  versa:
    s/([\x80-\xFF])/chr(0xC0|ord($1)>>6).chr(0x80|ord($1)&0x3F)/eg;
    s/([\xC0-\xDF])([\x80-\xBF])/chr(ord($1)<<6&0xC0|ord($2)&0x3F)/eg;
)

Later, when we've got IO disciplines, we should be able to mark
an input handle to in utf8.

A propos, I think in-place variants of to_utf8(0 and from_utf8()
might be in order.  utf8_on() and utf8_off(), possibly.  Maybe move
some of the functions to utf8.c, sv_cvtpv_to_utf8(), and sv_cvtpv_from_utf8(),
newSVpv_utf8(), newSVsv_to_utf8(), newSVsv_from_utf8(), maybe.

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen