Re: gzip-8bit2003-02-28 09:31:13On Monday 24 February 2003 13:07, ned+ietf-822(_at_)mrochek(_dot_)com wrote: <snip> So who wants to do the merge? <snip> I've implemented a deflate-8bit encoder for KMail using zlib. I've used a mixture of the two drafts, actually: 1. I used the deflate algorithm (zlib's deflate()), not gzip (mainly b/c zlib's interface to gzip files is - well - file-based, while the deflate interface is stream based). 2. I didn't shift the octets other when escaping (I think it's not necessary, yEnc does this to get around the many-NULs problem, which isn't present in deflate-8bit). 3. I escaped the following octet values: 0x00, 0x09, 0x0A, 0x0D, 0x20, 0x3D (NUL, HT, LF, CR, SP, '=') by prepending '=' and shifting their octet value by 64, ie. NUL becomes '=@' HT becomes '=I' LF becomes '=J' CR becomes '=M' SP becomes '=`' '=' becomes '=}' I admit that 42 would probably create visually more pleasing escape sequences: '=*', '=3', '=4', '=7', '=J', '=g' 4. I've set the maximum line length to 78 (mostly to have more than one line with small test vectors so CRLF injection can be tested). ad line lengths: I think that the line length should be free. Make it that a line MUST NOT be generated longer than 998, but MAY be generate with as few as 78 octets (counting the escape characters). Also, implementations MUST accept any line length <= 998 and SHOULD accept arbitrary line lengths. Injection of CRLF MUST NOT occur between the escape character and the escaped octet, but robust implementations MAY accept a "=" CRLF escaped-char sequence as being equivalent to "=" escaped-char. ad empty bodies: All other CTEs generate empty output if the input was emtpy. Plain deflate() creates it's 8 octet header and the 4 octet trailer/checksum in this case. I suggest that implementations SHOULD NOT encode empty input with deflate-*, but MUST accept both the 12 octet and the zero octet form of deflate-8bit-encoded content as meaning empty decoded content. analogously for deflate-base64. Attached is the result of applying my implementation of deflate-8bit to input consisting of the octet values 0x00..0xFF, in order. The CRLF is present as LF, ie. not in MIME-canonical form. The foo-42 example is has been created with 42 as escape shift, the other one with 64. And two nitpicks about the naming: I think deflate-8bit is misleading, since it's not the 8bit (identity) CTE that's applied to the deflate output, but a shifting and/or escaping algorithm, resp. So what about deflate-shifted? Or shifted-deflate (and thus base64-deflate)? Then shifted could even conceivably be used separately, e.g. for already compressed files. The second nitpick is the use of '-', which in quoted-printable doesn't have any special meaning, but here separates two independent layers. So how about making this deflate+shifted (or shifted+deflate) and deflate+base64. This could be a naming convention in case someone wants to use another algorithm on either side of the + in the future, e.g. deflate+base85 or bzip2+shifted... I knew you wouldn't like this :-) Marc -- memAlloc() Amnesia Error: Out of Memory
deflate-8bit-example
deflate-8bit-example-42
pgpWMefoDgG5Z.pgp
|
|