perl-unicode

Hassles with Unicode::String

2001-01-18 18:00:13
Using Unicode-String-2.06, I have the following test program:

=====

#!/usr/bin/perl -w

use Unicode::String qw(utf8 utf16 uchr);
Unicode::String->stringify_as('utf8');

@TestArr = ("0061 0062", "0063 12345");

foreach $TheString (@TestArr) {
    @AllHexIn = split(/\s+/, $TheString);
    $OutString = '';
    foreach $PartString (@AllHexIn)
        { $OutString .= utf8(uchr(hex("0x$PartString"))); }

    $TheLen = utf8($OutString)->length;

    $HexOfInput = '';
    foreach($i=0; $i<utf8($OutString)->length; $i++) {
        $HexOfInput .= utf8($OutString)->substr($i, 1)->hex . ' | ';
    }
    print "$TheString  $TheLen    $HexOfInput\n";
}

=====

The output is:

0061 0062  2    U+0061 | U+0062 |
0063 12345  3    U+0063 | U+d808 | U+df45 |

Why is uchr putting out UTF16 instead of UTF8 for the non-BMP character?

Even if uchr is putting out UTF16, why isn't the utf8() call coercing the value from UTF16 to UTF8?

How do I get this to put out UTF8, which is what I need?

--Paul Hoffman