Re: Massive Content-Type definition ideas & Gopher

To:  ietf-822(_at_)dimacs(_dot_)rutgers(_dot_)edu
Subject:     Re: Massive Content-Type definition ideas & Gopher 
Date:         7 Jun 1993  16:36 EDT

< From: Keith Moore <moore(_at_)cs(_dot_)utk(_dot_)edu>
< ...
< It's not anywhere nearly that bad.  First of all, we don't need multiple
< compression algorithms -- we need one that everyone can use freely.

I think this is a poor assumption. It wasn't that long ago that compress
came out; up until then, Huffman encoding was considered to be "it".


I don't know offhand when compress was written, but various adaptive
compression methods have been around since at least the mid-70s.  The
algorithm that 'compress' was based on was novel when it appeared because it
generated *fixed-length* compression codes; it was designed to be efficient
to implement in hardware.  'compress' became popular because it appeared at
the right time (filling a "void") and was freely distributed in source code.

If you count the various computing communities (especially the PC world) you
find half a dozen or so "symbol" (as opposed to "signal") compression
algorithms in widespread use.  Some are better than others, but there's not a
large difference in efficiency between them.  What this says to me is that
we've hit a wall of sorts.

But my crystal ball is fuzzy on this.  I don't know enough about data
compression to be able to say that nobody will make a breakthrough.  (Though
i'm cynical enough to believe, that it somebody does make a breakthrough, it
will be encumbered by patent restrictions.)  If a fantastically better way of
compressing text is discovered, we could of course define new encodings for
it.

My main concerns are (a) that whatever compression mechanism we build work
within the existing MIME structure, and (b) that we not define half a dozen
different compression algorithms for MIME when one is about as good as
another.

Then along comes LZ, LZW, and all the other variants along those lines, the
latest being the one used by gzip. How much longer until a newer algorithm
comes along that beats the pants off of these algorithms? Once we've chosen

the one<<, we can't later just say "oops, we chose the wrong one". At

that point it would be difficult to switch. Will we always be able to say
"this algorithm is the best for compression"?


The gzip algorithm is not new, and doesn't perform hugely better than other
algorithms that have been around for awhile.  What is novel about the gzip
algorithm is that it appears not to be encumbered.  (It's also nice that
there already exist multiple, interoperable, independently derived
implementations of the 'deflate' algorithm that gzip uses.)

What I'm trying to say is that choosing a single algorithm doesn't permit
future growth. It's always better to make an escape hatch available.


In principle, I agree.  And we do have an escape hatch; we can always define
additional content-transfer-encodings.  But it doesn't seem likely that we
will use this escape hatch anytime soon, and it doesn't seem worth the
trouble to add an extra layer that will (a) significantly change the existing
MIME framework, (b) make MIME more complicated, and (c) not be used very
often.

< Second, there would only be two new encodings added: one using base64 and
< another using binary.  Since compression presumably generates a stream of
< random-looking octets, quoted-printable wouldn't ever be optimal or 
< useful for readablity, and compressed-{7,8}bit wouldn't be useful at all.

You may be right here, we may be able to get by with just adding versions 
of base64 and binary.  But then, we may not. How clear is your crystal 
ball? :-)


Seems pretty clear on this one.  "compressed-binary" is obvious -- if you've
got binary transparency, you're not going to want to go to the trouble of
compressing something (gaining 50% or so) just to give up 25% of that gain to
encode in base64.  "compressed-base64" is obvious also, since we've gone to
the trouble to figure out the maximal set of "email-safe" characters.  Of
course, conditions change, so we might decide later that we can transmit a
wider range of characters and squeak more bandwidth out of the deal.  But the
incremental gain over base64 will be small.  And it seems like any new
transports that will be defined will be binary-transparent; there's a huge
gain to be had and it isn't really difficult to do nowadays.

< I'd personally like to avoid making such a drastic change to the MIME
< framework.   Also, if memory serves, we have discussed this topic before
< and ruled out multi-layer encodings.

No, I think the topic was postponed for perusal later on after the mime 
spec got out the door.


Maybe so.  By all means, let's discuss how compression should best be
integrated into the MIME framework.  But let us also have a respect
for existing (and soon to be released) MIME-based products and avoid
breaking them more than necessary.

<< For example,
<<    Content-Transfer-Encoding: base64; compression="CompressionAlgorithm1
"
<<    Content-Transfer-Encoding: 8bit; compression="CompressionAlgorithm1"

< This would probably break current MIME readers that would ignore the
< parameter.

Would current MIME readers be able to easily handle ANY method of extending
encoding to handle compression? Adding support for compressed-binary and
compressed-base64 is just as intrusive.


The best we can do for existing MIME readers is to label compression such
that they will issue a message of the form "Sorry, I don't support this
encoding".  The worst we can do is create a condition where a compressed body
part appears perfectly normal to an existing MIME reader, say by adding an
extra header or parameter.  Then the mail reader will *think* it understands
the message, when it doesn't, and attempt to display it anyway.

Keith