ietf-822
[Top] [All Lists]

Re: character sets

1991-05-07 12:40:03
So let's concentrate on the Quoted-Readable encoding, and
let's make it EBCDIC-safe i.e. unaffected by ASCII<->EBCDIC
conversions.

Unfortunately, this would require a transformation which is neither ASCII
nor EBCDIC, and liable to satisfy no one.

No, this would require the use of a set of characters that is common
to both ASCII and EBCDIC (actually, all versions of EBCDIC). This is
what BASE64 is all about, right? Actually, I hope we can find a larger
set of common characters. The BASE64 set would be rather limited for a
Quoted-Readable encoding. Any EBCDIC experts out there? How about
BASE85?

I have done some research on EBCDIC character sets.
I invesigated 27 EBCDICs currently in use according to IBM.
The codes investicated are the I/O interface code,
in GA27-2837-9 IBM 3270 inf.disp.sys character set ref. ch. 10.
The characters varied in encoding at some welldefined positions.
These  codes IBM call national use. There are 14 of these.
If you look at the general EBCDIC code table, these characters
are in "invariant EBCDIC":
SP .<(+&*);-/,%_>?:'=
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789

From ASCII we miss:
!"#$(_at_)[\]^`{|}~
This is equivalent to invariant ISO 646 minus the characters ! and ".
These characters are often then defined in national use positions.

" is then only missing in the old ("Alternate") Austrian/German,
Danish/Norwegian, Finnish/Swedish and Spanish  EBCDICs (in all 4
character sets, all old). Else this is always the same code.

! is only missing in the old ("Alternate") Austrian/German,
Danish/Norwegian and Finnish/Swedish EBCDICs (in all 3
character sets, all old) and the new Spanish and Spanish-speaking
EBCDICs (2 character sets).  Else this is 2 different places in
the respective sets. If the ! is not present a vertical line is
always present, in a place where ! is normally defined.

# is missing in 12 sets. 
$ is missing in 8 sets
@ is missing in 13 sets
[] is missing in 20 sets
\ is missing in 16 sets
^ is missing in 17 sets, but the not character is defined in all those.
` is missing in 10 sets
{} is missing in 17 sets
| is missing in 16 sets (broken bar).
~ is missing in 20 sets

Conclusion: These 14 characters should not be used in a 64-char encoding.
With some good will you may be able to use !" and maybe also ^ 
as invariant charaters.

Keld

<Prev in Thread] Current Thread [Next in Thread>