Re: Content-Transfer-Encoding and yEnc


Sam Roberts <sroberts(_at_)uniserve(_dot_)com> writes:

They do? It must be really transparent then, because I myself keep
downoading zipped .pdfs and gzipped postscript, and either I save it to
a file (on Linux) and deal with it myself, or IE opens the file with
WinZIP, and then I double-click it in WinZIP, which opens acrobat, for
example. Pretty much exactly what I do with email, and its annoying in
both cases.


Yes, that sounds extremely annoying, and that certainly doesn't happen to
me.  Apparently it's not as ubiquitous as it should be, but when I visit
gzipped content on the web, it's decompressed on the fly by the browser
and then handled as if it weren't compressed.

I agree it should be transparent. An "encapsulates" parameter to
content-type would allow this, and downgrade gracefully, TO THE CURRENT
SITUATION, as far as I can tell.


That is the problem with the Content-Transfer-Encoding, namely that the
downgrade situation isn't as nice.  That means that encapsulation makes
for an easier transition, but using a CTE makes for a much nicer long-term
situation.

You could look at it like that. Or you could say people take things
mpegs, and tar them up.


A collection of data is a different type of object, one that's not really
dealt with by any of the discussion so far on this topic.  I think the
closest MIME currently gets to this is the multipart/apple-double (or
whatever the correct MIME type is) that handles two-part Mac files.  I
think that treating an archive file like a tar file or a ZIP file that
contains multiple files (as opposed to ZIP just used as a compression
algorithm) as an application in its own right.  It's essentially a portion
of a file system, not an individual file.

The situation is clearer with gzip than with ZIP, since gzip uses a single
file model and doesn't combine the issues of compression and archiving.

Then they take the tar file, and they gzip it. Then it gets base64
encoded by the mail agent.


The gzip+base64 situation is a complicated one, and I'm not sure the right
way of handling it.  Intuitively, it seems to me like we need at least the
following CTEs:

    binary
    7bit
    8bit
    quoted-printable
    base64

    gzip (binary)
    gzip+base64

I'm not sure that we really need any kind of full layering, though, where
you can stack arbitrary encodings.  gzip+8bit makes no sense, for example,
and gzip+quoted-printable isn't going to win much in any situation over
gzip+base64.  If something like yEnc is introduced as another CTE, we'd
have to introduce it twice, once for uncompressed content and once for
compressed content... but I'm not sure that's really a problem.

I do think that we probably at least need to consider the need for two
types of compression.  gzip is a great default choice, sitting in a fairly
sweet spot between time and space tradeoffs, but bzip2 produces
significantly better compression if you're willing to take ~10x as long to
compress and I expect that people will start asking for it fairly quickly.

I see no reason to include ZIP in the list of encodings.  It doesn't seem
to offer, in the area of compression, any interesting benefits that
neither gzip nor bzip2 already serve, and it's not a stream-oriented
format like gzip and bzip2 both are which makes it much harder to handle
internally in the software that undoes CTEs.  (It's also not standardized
in an RFC so far as I know, and in fact bzip2 encoding should probably
wait for standardization of that format as well.  gzip is documented in
RFC 1952, but should probably have a standards-track specification prior
to a standards-track RFC that specifies a gzip CTE.)

The content-transfer-encoding/content-type pair is just 2 deep. It
doesn't look like it can represent this.


Well, the compression and encoding part of this still, to me, feels like a
transport layer that's clearly separable from the type of the content.
Archives (miniature file systems) are a more complex issue that I think
MIME will likely have to punt on and just assign content types to, because
you can't really do anything with them other than bring up a file system
browser that can launch applications on individual contents.

I agree CTE seems a great place to put transport-layer
compression. But...  you can't do it without apps that don't understand
the new CTEs (all of them, right now) not decoding the content, and in
addition they will treat it as application/octet-stream.

Isn't that what will happen? It just seems to me there is no migration
path. Am I missing something?


There's a migration path (you save the file and then decompress it) which
actually looks a little bit like what we have now, but yes, that part is
the only part that to me seems to leave something to be desired.  On the
other hand, is there any standardized way of handling compressed files
now?  (Are content types for gzip even registered?  I didn't believe that
there were.)  So from a standardization perspective, or even from a
deployed software perspective, what exactly are we transitioning from?
What do we have to have legacy support for?  Who are the current users and
what are they doing?

If what people are currently doing is using content types that are treated
as pretty much equivalent to application/octet-stream now, then the
transition isn't much of an issue.

-- 
Russ Allbery (rra(_at_)stanford(_dot_)edu)             
<http://www.eyrie.org/~eagle/>