Re: gzip/deflate compression/encoding


Bruce Lilly <blilly(_at_)erols(_dot_)com> wrote:

I think we must have all had similar ideas in mind.


FYI, I based the numbers above on:
o output consists of an addition code (1 octet) followed by a block of
  modified output octets (each octet is input value plus addition code,
  modulo 256).


Well then, not so similar after all.  I'd never heard of nor thought of
that scheme.

  For N=3, M=84, expansion is 1.39%
  Expansion is independent of input octet sequence.


For typical (roughly random) input, the expansion of the
variable-length-block scheme is half that, and the worst case is only
slightly greater than 1.39%, at 14/972 = 1.44%.  (The 1.63% worst-case
quoted in the source code includes two filler bytes per line for
protecting leading double-hyphen and trailing whitespace, and would not
be a fair comparison with the 1.39% figure.)

  (of course the decoder needs to keep track of the data count up to
  the buffer length and needs to save and apply the offset code to
  each octet)


For a non-validating decoder for the variable-length-block scheme, all
the original bytes that weren't 0, 10, or 13 can be copied blindly
without being inspected.  For typical (roughly random) input that's
98.2% of the decoder's input bytes, most of them in runs of dozens of
consecutive bytes, so this scheme might be faster or consume less power,
especially if it can take advantage of block-move instructions.

The addition-code scheme has the advantage of being conceptually
simpler and easier to code.  You might want to forbid blocks from
straddling line breaks, so that a non-validating decoder can eliminate a
compare-and-branch operation from the processing of every byte.

AMC