perl-unicode

tr///CU and tr///UC

1998-10-14 13:17:27
Are my expectations wrong?

#!/usr/bin/perl

use utf8;

$_ = "\x{FFFF}\x{EEE}\x{DD}";
tr/\x{FFFF}/\xCC/UC;

print "not " unless $_ eq "\xCC\x{EEE}\x{DD}";
print "ok 3\n";

print unpack("H*", $_), "\n";
# cceedd

#-----------------------

$_ = "\274\0\275\0\276";
tr/\275/\x{FFFF}/CU;

print "not " unless $_ eq "\275\0\x{FFFF}\0\276";
print "ok 4\n";

print unpack("H*", $_), "\n";
# c2bc00efbfbf00c2be

__END__

It looks like all characters in the tr argument string are converted
from UTF-8 to 8-bit bytes (or visa versa).  I would expect only the
characters mentioned in the first part of the tr// to be affected.

It means that we currently can use this as a filter to convert
between Latin-1 and UTF-8:

   perl -Mutf8 -pe 'tr///CU'

Regards,
Gisle

<Prev in Thread] Current Thread [Next in Thread>
  • tr///CU and tr///UC, Gisle Aas <=