perl-unicode

Re: bareword test on ebcdic.

2005-07-28 00:35:31


Nicholas Clark <nick(_at_)ccl4(_dot_)org> wrote:
On Tue, Jul 26, 2005 at 08:48:10AM -0700, rajarshi das wrote:

For the code points being tested
("\x{0442}\x{0435}\x{0441}\x{0442}")
does the perl source file contain the correct byte
sequence in UTF-EBCDIC?
Yes it does, since I ran the test, 
if (($hash{"\x{0442}\x{0435}\x{0441}\x{0442}"}) eq
($hash{eval '"\x{0442}\x{0435}\x{0441}\x{0442}"'}))
print "ok\n";
and the test ran fine, if that is what you mean by the
source file containing the correct byte sequence. Or
am I mistaken ?

You are mistaken, I'm afraid. bareword means no quotes.

In ASCII & UTF-8 land, the 1 liner

$ perl -le 'use utf8; $a{ඬ}++; print map {ord} keys %a'

gives

3500


The 3 bytes in the source code between '{' and '}' are 224, 182 and 172
which are the UTF-8 encoding for the code point 3500.

My question is, what are the bytes in UTF-EBCDIC that encode code point 3500?

The equivalent bytes on UTF-EBCDIC are 186, 84 and 83. 

If you put those 3 bytes directly between the '{' and '}' characters in
the EBCDIC version of that 1 liner, does it also print 3500?
I am unable to put those three bytes in the 1-liner you mentioned above, since 
I am unable to print the chars corresponding to those bytes 
(www.kostis.net/charsets/ebc1047.htm) on the command line. 

If so, *that* would explain the failures, and be the
thing that needs
correcting. The test file would need if/else with a
different test on EBCDIC.
what would you suggest be put in the if/ else ?

I think that the regression tests tended to do something like

if (ord 'A' == 65) {
# Do the ASCII/UTF-8 version
} else {
# Assume EBCDIC
}


Thanks,

Rajarshi.


Nicholas Clark



                
---------------------------------
 Start your day with Yahoo! - make it your home page 
<Prev in Thread] Current Thread [Next in Thread>