Gee, that sounds great. If the gzip/PKZIP algorithm is free of legal
entnagelements, as seems to be the case, and is produces reasonably
tight compression without too much CPU, why not go with it?
I think that is what someone -- Nathaniel? -- suggested a week or so
ago :-).
My only concern is that this area is, as others have pointed out, a
patent rat's-nest. Katz has released the algorithm he thinks he owns,
but whether that algorithm infringes on any of the patent claims is one
of those nasty little problems to which we will probably never know the
answer. The only thing that can be said with some little confidence is
that, if his infringes, it is probably the case that any system using a
general dictionary-based approach would. In other words, this is
probably as "free of legal entanglements" as well will find, but whether
that is zero or not may be an unanswerable question.
It seems
to me that "several compression algorithms and rules for choosing
among them" can be considered one more complex algorithm.
Sorry... wasn't clear. PKZIP, which is a product distributed under
license and for a fee, contains some interesting features, options, and
automated decision-making that I think PKWare considers proprietary and
which are not in gzip or any of the other variants. With the exception
of PKWare's encryption and probably authentication features, none of
those things prevent compressing and decompressing files (there might
be some performance implications, too).
I think that we need to focus on gzip and the algorithm, and then be
pleased that there is a very widely available commercial product that
supports the same scheme and provides a basis for widespread
interoperability testing and verification, rather than seeing this as
"PKZIP".
Furthermore, it seems like producing binary output is more general as
you can encode it as base64 if you need to or not if you don't.
May be a reasonable way to look at it. My concern was that, since
the compression scheme does not produce "lines", one would need to
encode it for any purpose that involved transport over SMTP (even
extended/ 8bit SMTP). Since the encoding boosts the size of the file
that one has just carefully compressed back up, one would avoid that in
the best of all possible worlds.
If one produced the restricted character set and periodic line breaks
directly from the compressor, there might be large gains in a 7bit
environment. And if one simply produced line breaks at frequent
intervals and escaped algorithm-produced CRLF sequences somehow, there
would be very large gains in an ESMTP/8bitMIME one.
Surely the effort in doing the base64 encoding is not a problem?
I wasn't worried about the effort. I was worried about getting a
circa 3:1 size improvement from compression, then throwing half of that
away by forcing an encoding that might be only marginally necessary
(e.g., if you had 8bit transport available).
-john