On Monday 24 February 2003 13:07, ned+ietf-822(_at_)mrochek(_dot_)com wrote:
<snip>
So who wants to do the merge?
<snip>
I've implemented a deflate-8bit encoder for KMail using zlib.
I've used a mixture of the two drafts, actually:
1. I used the deflate algorithm (zlib's deflate()), not gzip (mainly b/c
zlib's interface to gzip files is - well - file-based, while the
deflate interface is stream based).
2. I didn't shift the octets other when escaping (I think it's not
necessary, yEnc does this to get around the many-NULs problem, which
isn't present in deflate-8bit).
3. I escaped the following octet values:
0x00, 0x09, 0x0A, 0x0D, 0x20, 0x3D (NUL, HT, LF, CR, SP, '=')
by prepending '=' and shifting their octet value by 64, ie.
NUL becomes '=@'
HT becomes '=I'
LF becomes '=J'
CR becomes '=M'
SP becomes '=`'
'=' becomes '=}'
I admit that 42 would probably create visually more pleasing escape
sequences:
'=*', '=3', '=4', '=7', '=J', '=g'
4. I've set the maximum line length to 78 (mostly to have more than one
line with small test vectors so CRLF injection can be tested).
ad line lengths:
I think that the line length should be free. Make it that a line MUST
NOT be generated longer than 998, but MAY be generate with as few as 78
octets (counting the escape characters). Also, implementations MUST
accept any line length <= 998 and SHOULD accept arbitrary line lengths.
Injection of CRLF MUST NOT occur between the escape character and the
escaped octet, but robust implementations MAY accept a
"=" CRLF escaped-char
sequence as being equivalent to
"=" escaped-char.
ad empty bodies:
All other CTEs generate empty output if the input was emtpy. Plain
deflate() creates it's 8 octet header and the 4 octet trailer/checksum
in this case.
I suggest that implementations SHOULD NOT encode empty input with
deflate-*, but MUST accept both the 12 octet and the zero octet form of
deflate-8bit-encoded content as meaning empty decoded content.
analogously for deflate-base64.
Attached is the result of applying my implementation of deflate-8bit to
input consisting of the octet values 0x00..0xFF, in order. The CRLF is
present as LF, ie. not in MIME-canonical form. The foo-42 example is has
been created with 42 as escape shift, the other one with 64.
And two nitpicks about the naming: I think deflate-8bit is misleading,
since it's not the 8bit (identity) CTE that's applied to the deflate
output, but a shifting and/or escaping algorithm, resp. So what about
deflate-shifted? Or shifted-deflate (and thus base64-deflate)? Then
shifted could even conceivably be used separately, e.g. for already
compressed files. The second nitpick is the use of '-', which in
quoted-printable doesn't have any special meaning, but here separates
two independent layers. So how about making this deflate+shifted (or
shifted+deflate) and deflate+base64. This could be a naming convention
in case someone wants to use another algorithm on either side of the +
in the future, e.g. deflate+base85 or bzip2+shifted...
I knew you wouldn't like this :-)
Marc
--
memAlloc() Amnesia Error: Out of Memory
deflate-8bit-example
Description: Binary data
deflate-8bit-example-42
Description: Binary data
pgpWMefoDgG5Z.pgp
Description: signature