Re: how to utf8::encode and ::decode in 5.6.1


On Mon, 5 Aug 2002 22:17:10 +0100
Nicholas Clark <nick(_at_)unfortu(_dot_)net> wrote:

I'm trying to backport ExtUtils::Constant from 5.8.0 to work on perl pre
5.8.0. Currently ExtUtils::Constant is using utf8::encode and utf8::decode
to convert Unicode strings to and from their internal byte representation
for testing purposes.

For 5.005_03 I don't have a problem - I just skip all the Unicode tests! :-)
However, for 5.6.1 (and 5.6.0) I do. I can't work out how to (legally!) get
perl to give me the utf8 bytes that represent the Unicode strings, or how
to translate a sequence of utf8 bytes back into a perl Unicode string.

So how should I write utf8::encode and utf8::decode for 5.6.1 and 5.6.0?
I can cope if a different solution is needed on both.


How about these codelets?
(sorry, I haven't try them on 5.6.0).

$encoded = pack('C*', unpack 'C*', $string);
$decoded = pack('U*', unpack 'U0U*', $string);

We can use this instead utf8::upgrade($s),

  $upgraded = pack('U*', unpack 'C*', $string);

But this may be safer
    if it's dubious whether $string is UTF8 ON or OFF;

  $upgraded = pack('U*').$string;
  # pack('U*') generates an empty string with UTF8 flag on.

but cannot use this instead utf8::downgrade($s),

  $downgraded = pack('C*', unpack 'U*', $string);
  # ok on 5.8.0,
  # but, on 5.6.1, sometimes "Malformed UTF-8 character".

So this would be better, parhaps.

  $downgraded = pack('C*', unpack 'U*', pack('U*').$string);

The test.t of my Unicode::Normalize uses many pack() and unpack()
as tests should be passed both on Perl 5.6.1 and on 5.8.0,
and via XS and via Non-XS;
but this technique seems not to be portable to EBCDIC. :-/

Regards,
SADAHIRO Tomoyuki