On Mon, 5 Aug 2002 22:17:10 +0100
Nicholas Clark <nick(_at_)unfortu(_dot_)net> wrote:
I'm trying to backport ExtUtils::Constant from 5.8.0 to work on perl pre
5.8.0. Currently ExtUtils::Constant is using utf8::encode and utf8::decode
to convert Unicode strings to and from their internal byte representation
for testing purposes.
For 5.005_03 I don't have a problem - I just skip all the Unicode tests! :-)
However, for 5.6.1 (and 5.6.0) I do. I can't work out how to (legally!) get
perl to give me the utf8 bytes that represent the Unicode strings, or how
to translate a sequence of utf8 bytes back into a perl Unicode string.
So how should I write utf8::encode and utf8::decode for 5.6.1 and 5.6.0?
I can cope if a different solution is needed on both.
How about these codelets?
(sorry, I haven't try them on 5.6.0).
$encoded = pack('C*', unpack 'C*', $string);
$decoded = pack('U*', unpack 'U0U*', $string);
We can use this instead utf8::upgrade($s),
$upgraded = pack('U*', unpack 'C*', $string);
But this may be safer
if it's dubious whether $string is UTF8 ON or OFF;
$upgraded = pack('U*').$string;
# pack('U*') generates an empty string with UTF8 flag on.
but cannot use this instead utf8::downgrade($s),
$downgraded = pack('C*', unpack 'U*', $string);
# ok on 5.8.0,
# but, on 5.6.1, sometimes "Malformed UTF-8 character".
So this would be better, parhaps.
$downgraded = pack('C*', unpack 'U*', pack('U*').$string);
The test.t of my Unicode::Normalize uses many pack() and unpack()
as tests should be passed both on Perl 5.6.1 and on 5.8.0,
and via XS and via Non-XS;
but this technique seems not to be portable to EBCDIC. :-/
Regards,
SADAHIRO Tomoyuki