perl-unicode

Re: utf8::upgrade,utf8::encode and utf8::is_utf8 on EBCDIC platform

2005-09-01 15:18:47
Hello.
I think it is correct.

On EBCDIC platforms, perl uses UTF-EBCDIC instead of UTF-8,
nevertheless perl calls it "utf8."

In general chr(0xFF) (equals to "\xFF") in EBCDIC encodings
corresponds to U+009F, that is a single-octet control character;
thus a single octet sequence "\xFF" is well-form in UTF-EBCDIC too.

If you want to convert an interger to a character according to
Unicode scalar values, you can use pack('U'), but not chr().
For example, pack('U', 0xFF) should correspond to U+00FF
(y with diaeresis), everywhere (both on ASCII and on EBCDIC).

Regards,
SADAHIRO Tomoyuki

Hi,

 This are the tetstcase i'm runing on EBCDIC platform,

my $b = chr(0x0FF);
$p=utf8::upgrade($b);
print "\n$p";

utf8::upgarde returns the number of octets necessary
to represent the string as UTF-X.

EBCDIC output is 1 whereas ASCII platform output is 2.
Is the return value i'm getting on EBCDIC is correct?

my $c=chr(0x0FF);
print "before $c\n";
print "\n";
utf8::encode($c);
print "after $c\n";
print length($c);

On ASCII before is single octet repsentation and after
encode is two byte , length is 2.

On EBCDIC it is single before and after encode and
length is 1.

Is this correct on EBCDIC or is it a bug in code for
EBCDIC ?

utf::is_utf8 test whether STRING is in UTF-8, so 0x0FF
is UTF-8 on EBCDIC?



<Prev in Thread] Current Thread [Next in Thread>