perl-unicode

Re: Transliteration operator(tr//)on EBCDIC platform

2005-08-08 07:37:01
On Thu, Aug 04, 2005 at 11:42:54AM +0530, Sastry wrote:
Hi

I am trying to run this script on an EBCDIC platform using perl-5.8.6
 
($a = "\x89\x8a\x8b\x8c\x8d\x8f\x90\x91") =~ tr/\x89-\x91/X/;
is($a, "XXXXXXXX");


The result I get is 

 'X«»ðý±°X'

a) Is this happening  since \x8a\x8b\x8c\x8d\x8f\x90 are the gapped
characters in EBCDIC ?

I think so. In that \x89 is 'i' and \x91 is 'j'.


b) Should all the bytes in $a change to X?

I don't know. It seems to be some special case code in regexec.c:

#ifdef EBCDIC
                /* In EBCDIC [\x89-\x91] should include
                 * the \x8e but [i-j] should not. */
                if (literal_endpoint == 2 &&
                    ((isLOWER(prevvalue) && isLOWER(ceilvalue)) ||
                     (isUPPER(prevvalue) && isUPPER(ceilvalue))))
                {
                    if (isLOWER(prevvalue)) {
                        for (i = prevvalue; i <= ceilvalue; i++)
                            if (isLOWER(i))
                                ANYOF_BITMAP_SET(ret, i);
                    } else {
                        for (i = prevvalue; i <= ceilvalue; i++)
                            if (isUPPER(i))
                                ANYOF_BITMAP_SET(ret, i);
                    }
                }
                else
#endif


which I assume is making [i-j] in a regexp leave a gap, but [\x89-\x91] not.
I don't know where ranges in tr/// are parsed, but given that I grepped
for EBCDIC and didn't find any analogous code, it looks like tr/\x89-\x91//
is treated as tr/i-j// and in turn i-j is treated as letters and always
"special cased"

I don't know if tr/i-j// and tr/\x89-\x91// should behave differently
(ie whether we currently have a bug)

Nicholas Clark