ietf-822
[Top] [All Lists]

Re: gzip-8bit

2003-02-28 09:31:13
On Monday 24 February 2003 13:07, ned+ietf-822(_at_)mrochek(_dot_)com wrote:
<snip>
So who wants to do  the merge?
<snip>

I've implemented a deflate-8bit encoder for KMail using zlib.
I've used a mixture of the two drafts, actually:
1. I used the deflate algorithm (zlib's deflate()), not gzip (mainly b/c
   zlib's interface to gzip files is - well - file-based, while the
   deflate interface is stream based).
2. I didn't shift the octets other when escaping (I think it's not
   necessary, yEnc does this to get around the many-NULs problem, which
   isn't present in deflate-8bit).
3. I escaped the following octet values:
     0x00, 0x09, 0x0A, 0x0D, 0x20, 0x3D (NUL, HT, LF, CR, SP, '=')
   by prepending '=' and shifting their octet value by 64, ie.
     NUL becomes '=@'
     HT  becomes '=I'
     LF  becomes '=J'
     CR  becomes '=M'
     SP  becomes '=`'
     '=' becomes '=}'
   I admit that 42 would probably create visually more pleasing escape
   sequences:
    '=*', '=3', '=4', '=7', '=J', '=g'
4. I've set the maximum line length to 78 (mostly to have more than one
   line with small test vectors so CRLF injection can be tested).

ad line lengths:
I think that the line length should be free. Make it that a line MUST 
NOT be generated longer than 998, but MAY be generate with as few as 78 
octets (counting the escape characters). Also, implementations MUST 
accept any line length <= 998 and SHOULD accept arbitrary line lengths.

Injection of CRLF MUST NOT occur between the escape character and the 
escaped octet, but robust implementations MAY accept a
  "=" CRLF escaped-char
sequence as being equivalent to
  "=" escaped-char.

ad empty bodies:
All other CTEs generate empty output if the input was emtpy. Plain 
deflate() creates it's 8 octet header and the 4 octet trailer/checksum 
in this case.

I suggest that implementations SHOULD NOT encode empty input with 
deflate-*, but MUST accept both the 12 octet and the zero octet form of 
deflate-8bit-encoded content as meaning empty decoded content. 
analogously for deflate-base64.

Attached is the result of applying my implementation of deflate-8bit to 
input consisting of the octet values 0x00..0xFF, in order. The CRLF is 
present as LF, ie. not in MIME-canonical form. The foo-42 example is has 
been created with 42 as escape shift, the other one with 64.

And two nitpicks about the naming: I think deflate-8bit is misleading, 
since it's not the 8bit (identity) CTE that's applied to the deflate 
output, but a shifting and/or escaping algorithm, resp. So what about 
deflate-shifted? Or shifted-deflate (and thus base64-deflate)? Then 
shifted could even conceivably be used separately, e.g. for already 
compressed files. The second nitpick is the use of '-', which in 
quoted-printable doesn't have any special meaning, but here separates 
two independent layers. So how about making this deflate+shifted (or 
shifted+deflate) and deflate+base64. This could be a naming convention 
in case someone wants to use another algorithm on either side of the + 
in the future, e.g. deflate+base85 or bzip2+shifted...

I knew you wouldn't like this :-)

Marc

-- 
memAlloc() Amnesia Error: Out of Memory


Attachment: deflate-8bit-example
Description: Binary data

Attachment: deflate-8bit-example-42
Description: Binary data

Attachment: pgpWMefoDgG5Z.pgp
Description: signature

<Prev in Thread] Current Thread [Next in Thread>