Re: EBCDIC, uuencode, etc

Just some clarifications to John's remarks: (I was assuming that ASCII
chars got thru unchanged - I shouldn't assume such wonders :-)

Based on 27 EBCDIC national character sets:

  10646 name             number of EBCDIC sets missing in
# NUMBER SIGN            12
$ DOLLAR SIGN             8
@ COMMERCIAL AT          13
[ LEFT SQUARE BRACKET    20
\ REVERSE SOLIDUS        16
] RIGHT SQUARE BRACKET   20
^ CIRCUMFLEX ACCENT      17
     but the "NOT SIGN" character is defined in all those 17.
` GRAVE ACCENT           10
{ LEFT CURLY BRACKET     17
| VERTICAL LINE          16 ( actually appearing as broken bar).
} RIGHT CURLY BRACKET    17
~ TILDE                  20

 Because of the problem, I'm not sure which characters are identified
 above.


I made it more evident. See above.

Conclusion: These 14 characters should not be used in a 64-char encoding.
With some good will you may be able to use !" and maybe also ^
as invariant charaters.

You cannot discuss any of this without naming the characters because
I cannot tell which code page is being used nor which gateway may have
translated ISO to EBCDIC in the middle.


I was talking about EXCLAMATION MARK, QUOTATION MARK and CIRCUMFLEX ACCENT.

An IBM document contains the result of IBM analysis.  I discovered the
table by accident as I was browsing through it.  The only character here
but not in the above is the " (QUOTATION MARK).
In IBM C-H 3-3220-050, IBM Corporate Specification:  REGISTRY, Graphic
Character Sets and Code Pages, page 403 is Figure 5, Data Processing
Invariant Set, Syntactic Subset, 81 Characters Plus Space.  It includes:

A-Z,a-z,0-9 as above
       .<(+       FULL STOP (PERIOD), LESS THAN, LEFT PARENTHESIS, PLUS
&       *(;       AMPERSAND, ASTERIS, LEFT PARENTHESIS, SEMICOLON
-/     ,%_>?      MINUS, SOLIDUS (SLASH), COMMA, PER CENT, UNDERLINE,
                       GREATER THAN, QUESTION MARK
      :  '="      COLON, APOSTROPHE, EQUAL, QUOTATION MARK

and
SPace, EO (Eight Ones (X'FF'))

With SPace but not Eight Ones, the set contains 82 characters.


Which is one less that the invariant ISO  646 character set.
The character missing in the EBCDIC invariant collection is
EXCLAMATION MARK "!". 

Well, seems like my analyses was more carefull than IBMs
as I also spotted the QUOTATION MARK, which is missing in the
character sets mentioned in my earlier article (4 sets in total).

This adds the problem chars in addition to ISO 646 and EBCDIC
problem characters: %&*;<>_

I agree with Randall that we should not restrict ourselves too much
from these restricted character sets.


  Well, I'm not sure *I* agree.  As long as we need to have these
things pass transparently onto conforming, moral, upright hosts who
have not yet implemented support for RFC-XXXX, we need to be very
restrictive about these things.


What I meant was that this should not make us have to avoid these
"problem characters" in quoted-printable or quoted-readable.
Or should it? 

Surely it should affect our design of base64.

I believe that Roger's comments, Keld's comments, Stef's most recent
comments, and the position outlined above are consistent.  To rephrase
Stef's recent question, it seems to me that we may have reasonable
consensus on this, and on Base64.  Can we conclude that this is true
and get on with it?


By all means get on with it!

Keld