perl-unicode

Re: [EXPERIMENTAL] 1st draft of Encode

2000-09-11 15:40:30
Jarkko Hietaniemi <jhi(_at_)iki(_dot_)fi> writes:

Please take a look at the (very rough) first draft of Encode, an extension
for character encoding conversions for Perl 5:

      http://www.iki.fi/jhi/Encode.tgz

Download, plop it into the Perl 5.7 source directory, unpack,
re-Configure, rebuild.  (Or, if you have a Perl 5.7 in your path,
cd to ext/Encode, perl Makefile.PL, make).

I did not really understand the interface.  It seems like you expose
the fact that perl (currently) use utf8 internally too much.

I would like to see these convert perl strings to bytes:

  to_utf7
  to_utf8

     perl enhanced utf8 (does not restrict range to avoid surrogates
     and chars above 10FFFF as well as FFFE, FFFF)

  to_utf8_strict

     croaks on bad stuff 

  to_utf16_be
  to_utf16_le
  to_utf32_be
  to_utf32_le

And these convert a sequence of bytes to perl strings:

  from_utf8
  from_utf8_strict    # croak on out-of-range UTF8, over-long sequences, etc.
  from_utf16_be
  from_utf32_be

You seem to want to define these function the opposite way.  Perhaps
the names are just too confusing.

My previous attempt on this used names like encode_utf8() and
decode_utf8().  They also confused a lot of people.

The is_utf8() also seem wrong to me.  I believe that the SV invariant
should be that a string marked with the UTF8 flag should not contain
illegal UTF8 sequences.  Why is it not so?

Regards,
Gisle