Re: Encode's .enc files and a question


On Wed, 25 Oct 2000, Philip Newton wrote:

I didn't read up on the format, but I would gess that this maps from
EBCDIC position to Unicode in this way: take the EBCDIC code point and
treat it as an index into an array of four-character Unicode code points.
In which case, your table looks rather unlikely, since the last line
should then start "0030003100320033" -- that is, F0 .. F9 should map to
U+0030 .. U+0039, the digits.

I don't remember the code points for letters, but I'm fairly sure the
digits fall in the range F0 .. F9 in all flavours of EBDIC. You have
U+0031 at position 90.


Oops.  Thanks for spotting that.  I had the map inverted and was
writing a Decode file not an Encode file.  Assuming that the other
things at the head of the file are OK (and I guess we have Nick's Tcl 
doc to help with that) then this might do it:

# Encoding file: cp1047, single-byte
S
003F 0 1
00
0000000100020003009C00090086007F0097008D008E000B000C000D000E000F
0010001100120013009D000A00080087001800190092008F001C001D001E001F
0080008100820083008400850017001B00880089008A008B008C000500060007
0090009100160093009400950096000400980099009A009B00140015009E001A
002000A000E200E400E000E100E300E500E700F100A2002E003C0028002B007C
002600E900EA00EB00E800ED00EE00EF00EC00DF00210024002A0029003B005E
002D002F00C200C400C000C100C300C500C700D100A6002C0025005F003E003F
00F800C900CA00CB00C800CD00CE00CF00CC0060003A002300400027003D0022
00D800610062006300640065006600670068006900AB00BB00F000FD00FE00B1
00B0006A006B006C006D006E006F00700071007200AA00BA00E600B800C600A4
00B5007E0073007400750076007700780079007A00A100BF00D0005B00DE00AE
00AC00A300A500B700A900A700B600BC00BD00BE00DD00A800AF005D00B400D7
007B00410042004300440045004600470048004900AD00F400F600F200F300F5
007D004A004B004C004D004E004F00500051005200B900FB00FC00F900FA00FF
005C00F70053005400550056005700580059005A00B200D400D600D200D300D5
003000310032003300340035003600370038003900B300DB00DC00D900DA009F

and here is a mini test:

#!/usr/local/test/bin/perl -w

use strict;

use Encode qw(&from_to);

if (ord('A') == 65) {
    print "This looks suspiciously like an ascii machine\n";
}
else {
    print "The code point for the letter 'A' is ",ord('A'),"\n";
}

foreach my $char ("0".."9") {
    printf "The code point for the letter '%s' is %d (%02X hex)\n",
            $char,ord($char),ord($char);

    my $rc = from_to($char,'iso8859-1','cp1047');

    print "Error: The return value of from_to() was '$rc'\n" if ($rc != 1);
    printf "The code point for the letter is now %d (%02X hex)\n",
           ord($char),ord($char);
}
__END__

which yields:

 This looks suspiciously like an ascii machine
 The code point for the letter '0' is 48 (30 hex)
 The code point for the letter is now 240 (F0 hex)
 The code point for the letter '1' is 49 (31 hex)
 The code point for the letter is now 241 (F1 hex)
<snip>
 The code point for the letter '8' is 56 (38 hex)
 The code point for the letter is now 248 (F8 hex)
 The code point for the letter '9' is 57 (39 hex)
 The code point for the letter is now 249 (F9 hex)

So shall I go ahead with a cp1047.enc plus cp37.enc and posix-bc.enc
patch and perhaps some additions to t/lib/encode.t ?

Thanks for your help.

Peter Prymmer