Another approach to Encodings

Excerpts from internet.ietf-822: 26-Apr-91 Re: multiple Content-Encodi..
Neil Katin(_at_)eng(_dot_)sun(_dot_)com (2788)

However, I have a lot of trouble supporting Nathaniel's proposal of
recursive encodings.  Here is my biggest problem with it: there is
no way, short of reversing the filter, to see what is on the inside
of the encapsulated message -- instead I have to uncompress the
entire package so I can read the actual type of the body part!

To me, this is a severe disadvantage.  I'ld like to be "lazy"
in my decompression of the body part, and not do it until the user
has actually expressed an interest in seeing that body part.
But, I need to know the type upon receipt, in order to do
selective filtering on the message, and when displaying the
message header to the user.  Therefore I'ld like all that information
available at the "top level" of the message, instead of encapsuled
"n" levels deep.


Well, this is interesting.  I've been moving in this direction myself. 
In particular, in my prototype implementation, I've discovered that a
lot of messiness accumulates around the notion of using
"content-encoding" on a multipart message.  In other words, consider a
message of this form:

From: blah, etc.
Content-type: multipart; 1-s; foobar
Content-encoding: base64

--foobar
Q29udGVudC10eXBlOiBzb21ldGhpbmctZWxzZQpDb250ZW50LWVuY29kaW5nOiBiYXNlNjQK

.... data....
--foobar (-end?)

First of all, what the HECK does
"Q29udGVudC10eXBlOiBzb21ldGhpbmctZWxzZQpDb250ZW50LWVuY29kaW5nOiBiYXNlNjQK"
mean?  Well, it is the following, encoded in base64:

Content-type: something-else
Content-encoding: base64

Nice, huh?  Now, consider what "...data..." means.  In the current
semantics, this is data that has been encoded in base64, and then the
base64 encoding was ITSELF encoded in base64.  This is silly.  It also
makes implementation harder than it needs to be.

An obvious simplification would be to say that multipart messages can't
be encoded; only their subparts can be encoded.  Kind of elegant,
actually.  Because really, if we're talking about a transport-encoding,
it is something you only need to do ONCE, no more.

This suggests a couple of things.  First of all, it suggests that we
should avoid a syntax like the current one, which makes nested
Content-Encodings seem natural.  Second, it suggests that the people who
want a compressed content-encoding are probably right, and we should
bite the bullet and have one, rather than try to get it through
nestings.  Third, it suggests (to me) a tighter coupling  between
content-type and content-encoding.  Greg was arguing, in St. Louis, for
prescribing the list of encodings that are "acceptable" for a given
content-type.  I'm moving in that direction myself, and I'm even ready
to go a step further:  If we're only going to have at most one encoding
applied to each body part, and if the set of encodings that are
"acceptable" for at least some body parts is going to be specified, then
the encoding information is beginning to sound, to me, like another part
of the content-type header field.

Yes, I'm proposing the same thing I recently proposed for charset, which
is simply folding more information into the content-type field --
something like this:

Content-type:  type [/ charset]  [ (encoding) ] [ ; ver-num [ ; resource-ref] ]

(Ignore the syntax for now).  The encoding would be one of a very small
set -- probably base64, quoted-printable, and compressed.  Then, in the
prose describing each content-type, we could mention which encodings
were acceptable, if any.  For multipart (& singlepart, if we define a
single-encapsulated-message content-type), we would say NO encodings are
permissible, guaranteeing that the embedded headers remain readable, and
pushing off any necessary encoding to the embedded messages.

I think the above proposal should satisfy everyone except the people who
really want nested transformations (the content-conversion school of
thought).  I believe, though, that there's nothing here to stop them
doing that via a mechanism external to this RFC.  So this proposal might
actually be a way out of some of our wrangling.  What do people think?