To the MIME working group: I posted this to the PEM development
mailing list, and I'd like to get some reactions from you MIME folks
too. I'm not on the MIME mailing list, so please reply directly to me
as well as the MIME mailing list.
---------------------------------------------------------------------
To: The PEM development committee
From: Philip Zimmermann, prz(_at_)sage(_dot_)cgd(_dot_)ucar(_dot_)edu
Re: A refinement to Radix-64?
Well, I know it's disasterously late to make any suggestions about
changing PEM format, but this one might be painless enough and
compelling enough to entertain.
As you all know, RFC1113 currently uses the following table
(represented in C) for binary-to-ASCII radix 64 notation...
char bintoasc[] =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
Now we've all done enough programming to have at one time or another
written a binary-to-hex conversion function that uses the following
table...
char bintohex[] =
"0123456789ABCDEF";
This ordering of characters has been the universal representation of
radix-16 numbers since IBM (I guess it was) introduced it in the
stone age. It is used for all kinds of applications for dumping
bits in a form convenient for printing.
In fact, this radix-16 table can be used in that form to display
numbers in any base from 2 to 16, by using the appropriate left
subset of the table. The digits are in that same order for any radix
up to 16. Not to belabor the obvious, but note that the table is
merely an extension of the digits "0123456789", with some extra
digits added to bring it up to 16 digits. They didn't invent a new
sequence from scratch-- they just added more, leaving the old decimal
sequence intact. It would not have been as appealing if they had
just invented any arbitrary character sequence for the 16 hex digits.
Well, this RFC1113 method of displaying numbers in a kind of radix 64
could also be used in many of the same applications that were
formerly done in hex, but more compactly. That's why it was invented.
But not just for PEM-- it could replace hex in may other applications
as well. That would be nifty, possibly becoming as widely known as
the old standard hex representation.
But it sure would be neater if this new radix-64 sequence would leave
the old radix-16 sequence intact, and just added to it. Just like
the radix-16 just added more digits to radix-10. So why not make the
radix-64 sequence like this:
char bintoasc[] =
"0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz+/";
It's exactly the same set of characters, but reordered to build on
the radix-10 and radix-16 sets. And like the radix-16 set, left
substrings could be used for any radix less than 64. If someone had
an exotic need for a radix-32, for example, the same ordering could
be used. This sequence would be more "mathematically correct" (at
least historically) for representing radix 64. This sequence would
have wider utility than merely as an email transport armor-- it would
become *THE* new (begin underline and boldface) RADIX-64! (end
underline and boldface) It could become generally useful as a way of
representing numbers in base 64, just as standard hex became THE way
of representing numbers in base 16. Maybe future versions of C would
include it as a new printf numeric format descriptor, like "%x" is
today. Well okay, let's not get too grandiose here-- I was just
trying to emphasize the point.
Well, I doubt that radix 64 ever catches on like hex did, despite
it's compactness. Hex represents nybbles, which is more convenient
for aligning with byte boundaries. And radix 64 does have the + and
/ in it, which could mess up a C expression. But nobody is going to
mistake a dumped bit stream for a C expression anyway. Anyway, that
doesn't really seriously subtract from my arguments above.
I think it's compelling to note that the new ordering appears to be a
practically painless change, at least on purely technical grounds.
The same set of characters is used, exhibiting the same robustness to
damage from email gateway conversions. Not a single extra machine
cycle is wasted converting to or from the new radix 64 set, because
it's all just a few minor table entry changes. There is no down
side, as far as I can see-- just a serendipitous benefit of spinning
off a new potentially useful universal method of representing higher
radixes (radixi?) than base 16. Isn't that nice? Clean? Elegant?
Acceptance of this new proposal only requires overcoming political
inertia. Not just from PEM, but from the MIME people too. But all
of these are still infant standards, I think. I don't recall if MIME
is a true standard yet, but as I recall it refers to RFC1113 for its
radix 64 format, so maybe we can kill both birds with one stone.
There is no massive installed base of software that would be
disrupted, as far as I know. And even if there is some installed
base of prototype software, the change is truly simple to
implement-- there are precious few cases where an important change
can be really trivial to make, but this is one of them.
Doesn't all this sound worth persuading everybody to apply this tiny
little patch to the table, and letting nature run its course? Sure,
it's late. But it doesn't have to get that ensnarled in committees.
Wouldn't it be nice and painless to just have everybody nod their
approval, type a few keys on the keyboard, and "make it so"? At
least we should get everybody concerned to look at the idea. It
would be a shame if everyone thought it was a good idea, and would
like to see it changed that way, but balked at changing it because
they didn't know everyone else felt the same way.
Well, what do you all think?
BTW-- could somebody circulate this through the MIME people too,
because I lost the email address to do that myself. Are there any
other groups? Internet committees? Other email standards bodies
that use the same thing? Can they all get to see this idea before
it's too late?
Thanks for your attention.