perl-unicode

Re: how to utf8::encode and ::decode in 5.6.1

2002-08-10 16:30:06
On Tue, Aug 06, 2002 at 10:36:09PM +0900, SADAHIRO Tomoyuki wrote:

On Mon, 5 Aug 2002 22:17:10 +0100
Nicholas Clark <nick(_at_)unfortu(_dot_)net> wrote:

I'm trying to backport ExtUtils::Constant from 5.8.0 to work on perl pre
5.8.0. Currently ExtUtils::Constant is using utf8::encode and utf8::decode
to convert Unicode strings to and from their internal byte representation
for testing purposes.

For 5.005_03 I don't have a problem - I just skip all the Unicode tests! :-)
However, for 5.6.1 (and 5.6.0) I do. I can't work out how to (legally!) get
perl to give me the utf8 bytes that represent the Unicode strings, or how
to translate a sequence of utf8 bytes back into a perl Unicode string.

So how should I write utf8::encode and utf8::decode for 5.6.1 and 5.6.0?
I can cope if a different solution is needed on both.

How about these codelets?
(sorry, I haven't try them on 5.6.0).

Thanks. They seem to work very well on 5.6.1
After spending a couple of nights fighting all the Unicode bugs and
unhelpfulness in 5.6.1 with various workarounds, I gave up on the idea of
5.6.0 - it's just too much trouble.

The test.t of my Unicode::Normalize uses many pack() and unpack()
as tests should be passed both on Perl 5.6.1 and on 5.8.0,
and via XS and via Non-XS;
but this technique seems not to be portable to EBCDIC. :-/

I've not got access to EBCDIC, so I've no idea what will go wrong.

However, ExtUtils-Constant-0.13.tar.gz is currently working its way round
CPAN.

I couldn't find any sort of tie hash implementation on CPAN that would
let me reliably mix UTF8 and 8 bit scalars as hash keys for 5.6.1, so I
knocked up a quick one based on your unpack/pack code. (Although I'm
storing the hash keys as a string of BER compressed integers rather than
UTF8 bytes)

Did I miss one, or would this be a useful small module to separate out and
upload to CPAN in its own right? Clearly 5.8.0 doesn't need it:

____________________________________________________________________________
[  7980] By: jhi                                   on 2000/12/04  19:36:51
        Log: UTF-8 hash keys, patch from Inaba Hiroto.
     Branch: perl
           ! embed.h embed.pl hv.c hv.h pod/perlapi.pod proto.h
____________________________________________________________________________

but I guess there are people needing to stick on 5.6.1 who might find it
useful.

My experience of trying to manipulate data that is sometimes 8 bit, sometime
UTF-8 on 5.6.1? "Aaaaaaaaaaaaargh".
I'd really strongly recommend upgrading to 5.8.0, where hashes, s/// and tr///
"just work".

If anyone here tries ExtUtils::Constant and finds bugs, particularly in
the Unicode/UTF8 bits, please don't hesitate to report them.

Nicholas Clark
-- 
Even better than the real thing:        http://nms-cgi.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>