More confusion about the valid range of characters in Perl.
Both v5.8.8 and v5.10.0 Perl will pack('U', $v) for values of $v which
are > 0x7FFF_FFFF. The result is the (non-standard) Perl utf8 encoding
for such characters.
v5.8.8 Perl will unpack a string containing the non-standard encoding.
v5.10.0 Perl will not.
Consider:
use warnings ;
sub sp {
my ($v) = @_ ;
my $p = pack('U', $v) ;
my @t = unpack('C*', $p) ;
printf '\x%04X_%04X: ', ($v >> 16), $v & 0xFFFF ;
print map sprintf('\x%02X', $_), @t ;
print "\n" ;
} ;
sp(0x7FFF_FFFD) ;
sp(0x8000_0000) ;
sp(0xFFFF_FFFD) ;
v5.8.8 result:
\x7FFF_FFFD: \xFD\xBF\xBF\xBF\xBF\xBD
\x8000_0000: \xFE\x82\x80\x80\x80\x80\x80
\xFFFF_FFFD: \xFE\x83\xBF\xBF\xBF\xBF\xBD
v5.10.0 result:
\x7FFF_FFFD: \x7FFFFFFD
Malformed UTF-8 character (byte 0xfe) in unpack at tpbug.pl line 7.
Malformed UTF-8 character (unexpected continuation byte 0x82, with no
preceding start byte) in unpack at tpbug.pl line 7.
Malformed UTF-8 character (unexpected continuation byte 0x80, with no
preceding start byte) in unpack at tpbug.pl line 7.
Malformed UTF-8 character (unexpected continuation byte 0x80, with no
preceding start byte) in unpack at tpbug.pl line 7.
Malformed UTF-8 character (unexpected continuation byte 0x80, with no
preceding start byte) in unpack at tpbug.pl line 7.
Malformed UTF-8 character (unexpected continuation byte 0x80, with no
preceding start byte) in unpack at tpbug.pl line 7.
Malformed UTF-8 character (unexpected continuation byte 0x80, with no
preceding start byte) in unpack at tpbug.pl line 7.
\x8000_0000: \x00\x00\x00\x00\x00\x00\x00
Malformed UTF-8 character (byte 0xfe) in unpack at tpbug.pl line 7.
Malformed UTF-8 character (unexpected continuation byte 0x83, with no
preceding start byte) in unpack at tpbug.pl line 7.
Malformed UTF-8 character (unexpected continuation byte 0xbf, with no
preceding start byte) in unpack at tpbug.pl line 7.
Malformed UTF-8 character (unexpected continuation byte 0xbf, with no
preceding start byte) in unpack at tpbug.pl line 7.
Malformed UTF-8 character (unexpected continuation byte 0xbf, with no
preceding start byte) in unpack at tpbug.pl line 7.
Malformed UTF-8 character (unexpected continuation byte 0xbf, with no
preceding start byte) in unpack at tpbug.pl line 7.
Malformed UTF-8 character (unexpected continuation byte 0xbd, with no
preceding start byte) in unpack at tpbug.pl line 7.
\xFFFF_FFFD: \x00\x00\x00\x00\x00\x00\x00
And, FWIW, in 64-bit v5.8.8, pack('U', $v) appears to mask the $v value
to unsigned 32-bits before attempting to pack !
--
Chris Hall highwayman.com +44 7970 277 383
signature.asc
Description: PGP signature