[Top] [All Lists]

Re: gzip/deflate compression/encoding

2005-07-01 20:02:54

On Fri July 1 2005 18:33, ned+ietf-822(_at_)mrochek(_dot_)com wrote:
It is relatively easy to design a scheme that limits the
overhead to 1-2% no matter what the input.

Maybe, depending on the constraints.  The minimum constraints on the
output are:
o CRLF only for line endings, no lone CR or lone LF
o no NUL
o line length <= 998 octets
(for that is the definition of 8bit).  The input of course is an
unconstrained sequence of octets.

About 1.4% expansion should be possible with only those constraints,
a fairly simple algorithm (faster decode than encode), and moderate
encoder memory requirements (an 84 octet input buffer).

Adding constraints makes things more difficult.  For example,
constraining line length to 76 octets as is the case with
the other CTEs necessitates an expansion by 78/76 which is >2%
overhead exclusive of any other considerations.

Staying within 2% expansion while avoiding lone CR, lone LF, and NUL
implies a line length of about 250 octets.

Other constraints include amount of memory an encoder or decoder
might require, restrictions on leading or trailing whitespace,
avoiding other troublesome output sequences, whether or not a
stream can be encoded on the fly, and complexity of encoder and/or