perl-unicode

Encode's .enc files and a question

2000-10-24 14:45:43

Hi,

I've finally been looking at the Encode module and I am
somewhat perplexed by the stuff at the head of the Encode/*.enc
files.  It apparently has something to do with the C<read()> code
that looks like:

 my $rep = $class->can("rep_$type");
 my ($def,$sym,$pages) = split(/\s+/,scalar(<$fh>));

I am curious about the viability of an EBCDIC based .enc file so
I took the Encode/iso8859-1.enc and came up with one that I
might call Encode/cp1047.enc.  Would this be the correct form/format?
If so I can prepare this and a cp37.enc and a posix-bc.enc file
as well:

# Encoding file: cp1047, single-byte
S
003F 0 1
00
00000001000200030037002D002E002F001600050015000B000C000D000E000F
0010001100120013003C003D0032002600180019003F0027001C001D001E001F
0040005A007F007B005B006C0050007D004D005D005C004E006B0060004B0061
00F000F100F200F300F400F500F600F700F800F9007A005E004C007E006E006F
007C00C100C200C300C400C500C600C700C800C900D100D200D300D400D500D6
00D700D800D900E200E300E400E500E600E700E800E900AD00E000BD005F006D
0079008100820083008400850086008700880089009100920093009400950096
00970098009900A200A300A400A500A600A700A800A900C0004F00D000A10007
0020002100220023002400250006001700280029002A002B002C0009000A001B
00300031001A0033003400350036000800380039003A003B00040014003E00FF
004100AA004A00B1009F00B2006A00B500BB00B4009A008A00B000CA00AF00BC
0090008F00EA00FA00BE00A000B600B3009D00DA009B008B00B700B800B900AB
006400650062006600630067009E006800740071007200730078007500760077
00AC006900ED00EE00EB00EF00EC00BF008000FD00FE00FB00FC00BA00AE0059
004400450042004600430047009C004800540051005200530058005500560057
008C004900CD00CE00CB00CF00CC00E1007000DD00DE00DB00DC008D008E00DF

Also: since the .enc files seem to have adopted the four hex
digit per code point format how is the Encode module going
to handle UTF16 surrogates?

Thanks for any information.

Peter Prymmer