perl-unicode

[PATCH] Re: Transliteration operator(tr//)on EBCDIC platform

2005-08-11 11:32:10

On Wed, 10 Aug 2005 23:56:31 -0700 (PDT), rajarshi das 
<dazio_r(_at_)yahoo(_dot_)com> wrote

Hi,
This is Rajarshi expressing Sastry's viewpoints since he's on vacation. 

SADAHIRO Tomoyuki <bqw10602(_at_)nifty(_dot_)com> wrote:

According to the above statement in perlebcdic.pod,
s/[\x89-\x91]/X/g must substitute \x8e with X.
But it doesn't concern whether tr/\x89-\x91/X/ would substitute \x8e
with X or not, since tr/// does not use brackets, [ ].

Though I think ranges in [ ] and ranges in tr/// should coincide
and agree that tr/\x89-\x91/X/ should substitute \x8e with X,
that is just my opinion.
I don't know whether it is true and correct.
Is there some way we can confirm if this is correct (and expected behaviour)
since there isnt any explicit documentation for the tr operator ?

Since t/op/tr.t already has a test case (cf. Change 9038)
which Sastry previously pointed out its failing on EBCDIC Platform,
I assume that at least the then pumpking thought it to be correct.

By the way, when you say "If I specify [\x89-\x91]", does it
mean s/[\x89-\x91]/X/g or tr/\x89-\x91/X/ ? I'm confused.
We mean tr/\x89-\x91/X/.


We are first informed by you that gapped characters are not
substituted with X by tr/\x89-\x91/X/.
And you said s/[\x89-\x91]/X/g substituted all the characters
including gapped characters with X, hadn't you? 

Yes.
If so, I assume your [\x89-\x91] which doesn't matching any of
the gapped characters to be tr/\x89-\x91/X/.
That's correct. We mean tr/\x89-\x91/X/.


The following is a part of the current core tests from op/pat.t.
I believe they should be passed.
Yes all the following tests pass. I think the following tests are in the 
context of the 
s/[]/X/ operator and hence pass. 

Thanks,

Rajarshi.

OK. To me, it is confirmed that s/[]/X/ is fine and tr/// has a problem.
Since I don't have any EBCDIC machine, I can't ensure the following
patch will really makes sense.

Regards,
SADAHIRO Tomoyuki

! t/op/tr.t, toke.t

diff -ur perl~/t/op/tr.t perl/t/op/tr.t
--- perl~/t/op/tr.t     Mon Aug 01 17:17:24 2005
+++ perl/t/op/tr.t      Thu Aug 11 23:41:22 2005
@@ -295,18 +295,15 @@
 # (i-j, r-s, I-J, R-S), [\x89-\x91] [\xc9-\xd1] has to match them,
 # from Karsten Sperling.
 
-# Not working in EBCDIC as of 12674.
 $c = ($a = "\x89\x8a\x8b\x8c\x8d\x8f\x90\x91") =~ tr/\x89-\x91/X/;
 is($c, 8);
 is($a, "XXXXXXXX");
-   
-# Not working in EBCDIC as of 12674.
+
 $c = ($a = "\xc9\xca\xcb\xcc\xcd\xcf\xd0\xd1") =~ tr/\xc9-\xd1/X/;
 is($c, 8);
 is($a, "XXXXXXXX");
 
-
-SKIP: {   
+SKIP: {
     skip "not EBCDIC", 4 unless $Is_EBCDIC;
 
     $c = ($a = "\x89\x8a\x8b\x8c\x8d\x8f\x90\x91") =~ tr/i-j/X/;
diff -ur perl~/toke.c perl/toke.c
--- perl~/toke.c        Mon Jul 18 04:31:02 2005
+++ perl/toke.c Thu Aug 11 22:55:18 2005
@@ -1368,6 +1368,9 @@
     I32  has_utf8 = FALSE;                     /* Output constant is UTF8 */
     I32  this_utf8 = UTF;                      /* The source string is assumed 
to be UTF8 */
     UV uv;
+#ifdef EBCDIC
+    UV literal_endpoint = 0;
+#endif
 
     const char *leaveit =      /* set of acceptably-backslashed characters */
        PL_lex_inpat
@@ -1417,8 +1420,9 @@
                 }
 
 #ifdef EBCDIC
-               if ((isLOWER(min) && isLOWER(max)) ||
-                   (isUPPER(min) && isUPPER(max))) {
+               if (literal_endpoint == 2 &&
+                   ((isLOWER(min) && isLOWER(max)) ||
+                    (isUPPER(min) && isUPPER(max)))) {
                    if (isLOWER(min)) {
                        for (i = min; i <= max; i++)
                            if (isLOWER(i))
@@ -1437,6 +1441,9 @@
                /* mark the range as done, and continue */
                dorange = FALSE;
                didrange = TRUE;
+#ifdef EBCDIC
+               literal_endpoint = 0;
+#endif
                continue;
            }
 
@@ -1455,6 +1462,9 @@
            }
            else {
                didrange = FALSE;
+#ifdef EBCDIC
+               literal_endpoint = 0;
+#endif
            }
        }
 
@@ -1788,6 +1798,10 @@
            s++;
            continue;
        } /* end if (backslash) */
+#ifdef EBCDIC
+       else
+           literal_endpoint++;
+#endif
 
     default_action:
        /* If we started with encoded form, or already know we want it
###END OF PATCH


<Prev in Thread] Current Thread [Next in Thread>