ietf-822
[Top] [All Lists]

Re: gzip/deflate compression/encoding

2005-06-27 07:34:17

On Sun June 26 2005 23:51, Laird Breyer wrote:

Since support for compression is already built into the TLS protocol
by RFC 3749, the benefit in lower bandwidth requirements and faster
transfer can be achieved today as long as both the server and client use
this feature.

That's a big "if".  Obviously, that's only hop-by-hop, and it doesn't
apply it TLS isn't available, or with TLS implementations that do not
support the RFC 3749 extension.

Such compression cannot work around e.g. SMTP message size limits as
the message would be stored in uncompressed form.

On the other hand, the message sender is already free to compress his
attachments any way desired,

Indicating the fact of compression and the method by...? (w/o obscuring
the nature of the media type that is compressed)

There are two distinct issues at work:
1. encoding binary data to fit into 8bit (as opposed to 7bit) transport
   can be done in a more space-efficient manner than is possible with
   binary-to-7bit encoding
2. compression for size reduction of stored/transmitted content

and introducing a compressed transfer encoding 
will seriously complicate MIME decoders.

It will add standard encodings, of course, and MIME decoders claiming
conformance to the encoding specification(s) would presumably have to
support them.  If the combination of compression and encoding is handled
by a Content-Transfer-Encoding tag alone, the possibly desirable
combinations might include:
1. binary compression alone
2. binary-to-8bit encoding alone
3. combination of binary compression followed by binary-to-8bit encoding
4. combination of binary compression followed by binary-to-7bit encoding
   (presumably base64)
That would expand the current 5 tags (quoted-printable, base64, 7bit, 8bit,
binary) to as many as 9 tags.

As an alternative, if a separate MIME extension field and associated 
mechanism were developed for compression, the only encoding that is missing
from the current lineup is binary-to-8bit.

A rough outline of how that might work:
o A Content-Compression field, field body consists of a registered keyword.
o Possibly provision for private-use unregistered (x-) tags.
o Initially defined keyword could be "gzip".
o An IANA registry for compression keywords and an associated registration
  procedure.
o possibly a provision for field parameters, including provision for a
  filename parameter for use in the event that a compression method is
  unrecognized or where an implementation may wish to use an intermediate
  file between decoding and decompression
o When compressing and encoding, compression is applied to media first,
  then encoding is applied to the compressed (binary) data.
o When decompressing and decoding, decoding is performed first, yielding
  binary data which is uncompressed to yield the base media.
o Obviously, if only compression or encoding/decoding applies, there is no
  need to worry about the order of processing.
o if an encoding tag is unrecognized by a receiving implementation, no
  processing takes place (ideally with a notification to the user)
o Security considerations related to the compression method

Clearly, implementations claiming conformance with such a specification
would need to support compression and decompression (library code exists
for that purpose).  With compression indicated orthogonally to encoding,
a single compression method and a single new encoding, coupled with the
existing encodings, would suffice to cover all four combinations listed
above.

If a receiving implementation supports encoding but not compression, there
are two cases:
A. the encoding (e.g. base64) is recognized and the content is decoded.
   That leaves binary data which can be saved to a file and manually
   decompressed.
B. the encoding is not recognized. Same situation applies as for a set of
   new encoding tags which are unrecognized.  Manual decoding and
   decompression can be used.