ietf-822
[Top] [All Lists]

Re: Content-Transfer-Encoding and yEnc

2002-04-10 09:08:03

They do? It must be really transparent then, because I myself keep
downoading zipped .pdfs and gzipped postscript, and either I save it to
a file (on Linux) and deal with it myself, or IE opens the file with
WinZIP, and then I double-click it in WinZIP, which opens acrobat, for
example. Pretty much exactly what I do with email, and its annoying in
both cases.

Yes, that sounds extremely annoying, and that certainly doesn't happen to
me.  Apparently it's not as ubiquitous as it should be, but when I visit
gzipped content on the web, it's decompressed on the fly by the browser
and then handled as if it weren't compressed.

The HTTP profile of MIME is very different from that of email. In HTTP the
choice was made to do compression of objects as a whole, not on a leaf by leaf
basis. A content-encoding header was defined for this purpose, and it supports
gzip (as well as other stuff).

And since HTTP is single hop it is possible to negotiate encodings
and avoid sending something the receiver doesn't understand.

A collection of data is a different type of object, one that's not really
dealt with by any of the discussion so far on this topic.  I think the
closest MIME currently gets to this is the multipart/apple-double (or
whatever the correct MIME type is) that handles two-part Mac files.  I
think that treating an archive file like a tar file or a ZIP file that
contains multiple files (as opposed to ZIP just used as a compression
algorithm) as an application in its own right.  It's essentially a portion
of a file system, not an individual file.

Quite true. An archive possess additional properties. I note in passing that
the trick of using archives for email has been tried in the past: The NeXT used
compressed tar files. It worked very poorly in practice, in part because the
archive's labelling capabilities weren't at all aligned with the needs of
message parts. There was a time when I got complaints about such messages
fairly regularly, but I haven't seen one in quite a while at this point.

The gzip+base64 situation is a complicated one, and I'm not sure the right
way of handling it.  Intuitively, it seems to me like we need at least the
following CTEs:

    binary
    7bit
    8bit
    quoted-printable
    base64

    gzip (binary)
    gzip+base64

I suggest that these be a gzip-8bit (the difference in overhead between this
and binary is insignificant, and the benefits of having line oriented data are
huge) and gzip-base85 (the EBCDIC concerns that drove the use of a 64 character
alphabet versus an 85 character alphabet seem to be one of the few things that
are truly no longer a concern for email). But I agree this is the right idea.

I'm not sure that we really need any kind of full layering, though, where
you can stack arbitrary encodings.  gzip+8bit makes no sense, for example,
and gzip+quoted-printable isn't going to win much in any situation over
gzip+base64.  If something like yEnc is introduced as another CTE, we'd
have to introduce it twice, once for uncompressed content and once for
compressed content... but I'm not sure that's really a problem.

Why bother? I believe it is easy to generate "uncompressed gzip" if you
really want to. And I'd rather solve the problem yEnc seeks to solve
as part of all this...

I do think that we probably at least need to consider the need for two
types of compression.  gzip is a great default choice, sitting in a fairly
sweet spot between time and space tradeoffs, but bzip2 produces
significantly better compression if you're willing to take ~10x as long to
compress and I expect that people will start asking for it fairly quickly.

Humpf. I see the rationale, but I'm really uncomfortable with more than
one compression.

Well, the compression and encoding part of this still, to me, feels like a
transport layer that's clearly separable from the type of the content.
Archives (miniature file systems) are a more complex issue that I think
MIME will likely have to punt on and just assign content types to, because
you can't really do anything with them other than bring up a file system
browser that can launch applications on individual contents.

Absolutely. There's a place for these things. It just isn't in doing
general messaging.

I agree CTE seems a great place to put transport-layer
compression. But...  you can't do it without apps that don't understand
the new CTEs (all of them, right now) not decoding the content, and in
addition they will treat it as application/octet-stream.

Isn't that what will happen? It just seems to me there is no migration
path. Am I missing something?

There's a migration path (you save the file and then decompress it) which
actually looks a little bit like what we have now, but yes, that part is
the only part that to me seems to leave something to be desired.  On the
other hand, is there any standardized way of handling compressed files
now?  (Are content types for gzip even registered?  I didn't believe that
there were.)  So from a standardization perspective, or even from a
deployed software perspective, what exactly are we transitioning from?
What do we have to have legacy support for?  Who are the current users and
what are they doing?

The rules say unknown encodings have to be presented as an series of octets of
unknown types. The rules also say you're supposed to offer to save such stuff
to a file. So you have the same options of dealing with the stuff manually.

If what people are currently doing is using content types that are treated
as pretty much equivalent to application/octet-stream now, then the
transition isn't much of an issue.

Well, there's a bit of a mess in the case of gzip-8bit and email and
downgrading, but yes, gzip-base85 or gzip-base64 can both be managed IMO.

                                Ned