Re: gzip/deflate compression/encoding


On Wed June 29 2005 03:02, Laird Breyer wrote:


On Jun 29 2005, Bruce Lilly wrote:

MIME transfer encoding is typically used end-to-end.  It doesn't
matter where the bottleneck is, doesn't require any special software
support at intermediate sites.


Yes, this is the crucial point. You've identified one problem at intermediate
nodes of the network, namely the SMTP size limit which depends on the
installed MTA software.


and administrative policy (and possibly, depending on implementation,
factors such as available storage capacity).

With CTE: No software change at intermediate nodes is required. 
 Major change at all client leaf nodes is required just to continue
using email, because the CTE makes the compressed content opaque.


Not "all", only those wishing to use the particular transfer encoding.
A sender who uses base64 alone and recipients of such content need not
be concerned with other encodings.

Without CTE, but TLS: changes at intermediate nodes and leaf nodes 
bring definite benefits, but no change is required of all clients at once.


Base TLS benefits are orthogonal to the issue; optional compression over
TLS doesn't address a crucial problem (the SMTP size limit), and involves
multiple CPU-intensive compress/decompress cycles.

And finally: if the user sends precompressed documents as 8bit attachments 
from his MUA, no changes at all are required today. This is the purest
end-to-end scenario.


Not quite.  Any such scheme requires some sort of cooperation at at least
the end points (sender and receiver need to agree on the method, packaging,
etc.).  Some schemes require implicit cooperation of intermediate nodes
(not adding empty lines, not folding long lines, not adding/removing
whitespace at the end of lines, not mangling lines that happen to begin
with "From ", etc.  If MIME is used, then there is some hypothetical and
probably private-use encoding or media type or both, which sender and
recipient need to agree upon.  If not MIME, then sender and recipient also
need to agree on all of the equivalent framework that MIME provides
(checksums, language indication, parameters, fitting into the Internet
Message Format, and so on).

Indeed, compression is always an option whether or not a CTE is available.
Just because the sender pushes an uncompressed message over the wire doesn't
mean that the receiver must accept the uncompressed stream into a holding
buffer the same size as the message, it can be compressed as it is received.


Perhaps, but the SMTP SIZE indication is based on message DATA size as
transferred at the application protocol level, not on some hypothetical
compression.

If you're going to give the SMTP receiver an option to rewrite the
message encoding with a CTE, then on-the-fly compression in RAM or on
disk is also an option which compares favourably.


I'm not quite sure what you're getting at...  Hypothetically an MTA
could apply a CTE to meet transport requirements, but there is no
requirement for a particular implementation method.

Sorry, I was aware of that, I meant a second CTE compression on top
of the video stream.


That would be ineffective, as discussed in a previous message.  N.B.
that applies to compression over TLS and filesystem compression.

Regarding text compression, big savings are possible 
but normally text isn't the bulkiest part of a large email, so compression is 
not crucial.


Apparently from comments in this thread there is some desire to compress
repetitive text-like stuff ("all that XML").

No, the information is all there (both formats can be opened in OpenOffice
and contain the same spreadsheet data).


Except for VB macros and whatever else Microsoft decided to put in
their files. I've been told that Excel is Turing complete...


OpenOffice includes the macros, but comments them out in the .sxc format,
since OpenOffice doesn't interpret the same macro language.

I really don't know, it seems easy to define these CTEs, but once they exist
everyone is stuck with them and there's no backwards compatibility.


Backward compatibility is indeed a concern, and is why proliferation of
transfer encodings is undesirable.  Backward compatibility is going to
take a hit no matter what, so it's a matter of considering the pros and
cons and trying to design a solution that inflicts minimum harm both
now and for the foreseeable future.  If a MIME-version increment is
warranted, then it would be prudent to minimize the chance that future
extensions would need yet another increment.

If it really has to be tried out, wouldn't a few content types
make more sense to test the waters?


There is a strong tradition against registration of media types which
are really transfer encodings.  The biggest problem would be under the
text media type, where some built-in compression would almost certainly
conflict with the characteristics that text types are supposed to have
(e.g. "software must not be required in order to get the general idea
of the content".  (And I would probably be among the first to complain.)

A CTE which replaces them can 
always be defined some years later, and in the meantime it doesn't
force all standards compliant software to implement a decompressor.


In which ways would delaying a backwards incompatibility introduction
be a good thing?  The MIME RFCs are currently at Draft status.
Introduction  of new encodings now might force a reset to Proposed.
How would advancement to full Standard soon, followed by a later reset
to Proposed be advantageous?