perl-unicode

Re: iso-2022-jp encoding on EBCDIC

2005-12-19 23:29:11


--- SADAHIRO Tomoyuki <bqw10602(_at_)nifty(_dot_)com> wrote:


On Wed, 14 Dec 2005 05:19:00 -0800 (PST), rajarshi
das <dazio_r(_at_)yahoo(_dot_)com> wrote

 Hi,

The following two line script gives an error on
z/OS : "Unknown encoding 'iso-2022-jp' at line ..".
-----------------
use Encode;
use encoding 'iso-2022-jp';
------------

How do we confirm if iso-2022-jp is supported on
z/OS or not ?
Or if it is supported and not working as expected
for some reason ? 

I found Encode/Config.pm such a chunk.

unless (ord("A") == 193){
    %ExtModule =
      (
(snip)
       '7bit-jis'           => 'Encode::JP',
       'euc-jp'             => 'Encode::JP',
       'iso-2022-jp'        => 'Encode::JP',
       'iso-2022-jp-1'      => 'Encode::JP',
       'jis0201-raw'        => 'Encode::JP',
       'jis0208-raw'        => 'Encode::JP',
       'jis0212-raw'        => 'Encode::JP',
       'cp932'              => 'Encode::JP',
       'MacJapanese'        => 'Encode::JP',
       'shiftjis'           => 'Encode::JP',

I placed the lines 
       'iso-2022-jp' ....
       'iso-2022-jp-1' .... 
        ...
        ....
       'shiftjis' .... 

outside the unless block. 

Then, it started recognising the iso-2022-jp encoding.



And I found this in Encode/JP.pm (similarly in CN,
KR, TW)

BEGIN {
    if (ord("A") == 193) {
      die "Encode::JP not supported on EBCDIC\n";
    }
}

I commented the above lines in the BEGIN block.

Any pointers to the source where this encoding is
defined ?
The pointers might help understand why it is not
defined
on an EBCDIC platform. 

I printed Encode->encodings() on linux as well as
z/OS
and they are identical (and both donot contain
iso-2022-jp).
Is this in anyway related to the above problem ? 

Try this on linux and z/OS. I guess they are not
identical.

  use Encode;
  use Encode::JP; # load JP encodings including
iso-2022-jp
  print Encode->encodings();
The above gives different results on linux and ebcdic.


I am testing this with iso-2022-jp encoding :
------------------------
use encoding 'iso-2022-jp';

$a = "^[$B$!^[(B";
print "a : $a\n";
------------------------

On linux, I get :
a : ^[^[(B 
/* Why is the '(B' shown? Isnt this just an escape
char to switch over to ASCII ? */ 

On ebcdic, I get : 
Malformed UTF-8 character (unexpected end of string)
at /u/isldev2/tmp_dbg/perl-5.8.7/lib/utf8_heavy.pl
line 330.
Malformed UTF-8 character (unexpected continuation
byte 0x6a, with no preceding start byte) in pattern
match (m//) at
/u/isldev2/tmp_dbg/perl-5.8.7/lib/utf8_heavy.pl line
337.
Malformed UTF-8 character (unexpected continuation
byte 0x6a, with no preceding start byte) in pattern
match (m//) at
/u/isldev2/tmp_dbg/perl-5.8.7/lib/utf8_heavy.pl line
337.

-- and some junk data.

Seems like in "$B$!^[(B" above, $! and ^[ are
incorrect two byte sequences on ebcdic. However, $!
donot translate into printable characters on cp-1047 .
What do we replace them by ? 

I tested again with  : 
---------------------------------
use encoding 'iso-2022-jp';
$a = "$B&&(B"; # && is \x50\x50 on EBCDIC which is
valid acc to jis0208.ucm
print "a : $a\n";
----------------------------------

But I still get the messages as above and some junk
data in $a which I dont think is the correct o/p.

Rajarshi.

Regards,
SADAHIRO Tomoyuki





__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

<Prev in Thread] Current Thread [Next in Thread>