Using Unicode-String-2.06, I have the following test program:
=====
#!/usr/bin/perl -w
use Unicode::String qw(utf8 utf16 uchr);
Unicode::String->stringify_as('utf8');
@TestArr = ("0061 0062", "0063 12345");
foreach $TheString (@TestArr) {
@AllHexIn = split(/\s+/, $TheString);
$OutString = '';
foreach $PartString (@AllHexIn)
{ $OutString .= utf8(uchr(hex("0x$PartString"))); }
$TheLen = utf8($OutString)->length;
$HexOfInput = '';
foreach($i=0; $i<utf8($OutString)->length; $i++) {
$HexOfInput .= utf8($OutString)->substr($i, 1)->hex . ' | ';
}
print "$TheString $TheLen $HexOfInput\n";
}
=====
The output is:
0061 0062 2 U+0061 | U+0062 |
0063 12345 3 U+0063 | U+d808 | U+df45 |
Why is uchr putting out UTF16 instead of UTF8 for the non-BMP character?
Even if uchr is putting out UTF16, why isn't the utf8() call coercing
the value from UTF16 to UTF8?
How do I get this to put out UTF8, which is what I need?
--Paul Hoffman