Re: LZJU90 compression example(s)

FWIW, I think having a compressing transfer-encoding would be useful,
especially if it's rolled out at the same time as a "widetext" toplevel
type where compression would be particularly useful (especially if we
decide it'd be cleaner to send around pure UCS-4 rather than UTF-16).


I agree.

Now it's true that deploying such a transfer-encoding for general use in
email is unlikely without something like mailcap.


I agree with this as well.

It's also true that applying compression to already compressed objects
(e.g., mpeg, jpeg, etc) can cause an increase in size.  The solution is
not to use compression on such objects.


Seems reasonable enough to me.

I think we should consider LZJU90 as a potential alternative to gzip +
base64.  LZJU90 is certainly much simpler than gzip, although it also
provides significantly less compression.  From what I can tell, both
LZJU90 and gzip are based on LZ77, so the main difference is that gzip
adds huffman coding with optional adaptive tables on top of what LZJU90
does and that gzip is far more widely deployed.


I'm afraid I have a big problem with this. First of all, all the debate  about
compression speed seems to me to be beside the point. CPUs get faster all the
time and there's no indication this will change any time soon. Networks, on the
other hand, haven't increased in speed in a comparable fashion and there's no
indication this is going to change either, the promise of xDSL notwithstanding.

As such, it seems to me that we should go after just as much compression as we
can get without spending an excessive amount of time. And given this deflate is
the clear hands-down winner over LZJU90.

Another problem I have with the current debate is this notion that it isn't
fair to compare the sample implementation in the draft with various other
things. We have ample experience in this area, more than enough to know that
whatever you publish in an RFC is what is going to get used in 99% of the
implementations out there. As such, if you want to claim that better
performance in terms of speed or compression can be had, put it in the version
in the document or I'm not interested in hearing about it. Alternately, don't
provide any implementation in the draft and instead provide a pointer to
a high quality reference implementation like zlib.

Indeed, I'd say that this last -- a high quality reference implementation, is
an absolute requirement. The notion that MUA implementors are going to sit
around and figure out how to write better, more optimized LZJU90 encoders and
decoders simply isn't realistic.

A few specific problems with the current LZJU90 draft:

              * LZJU90 <name>

It's a layering violation for a content transfer-encoding to include a
filename.  Filenames belong in the Content-Disposition header.


Agreed. It also fails to meet internationalization requirements, which makes it
a total nonstarter.

The
count makes me a bit uncomfortable.  I'm also inclined to say that CRCs
have little value at the applications-level these days.  I can't
say that I've ever seen a MIME object damaged in transit unintentially.
If we're going to do any sort of integrity protection, it should be
cryptographic, IMHO.  CRCs just don't provide enough functionality for the
code complexity.


The bigger problem with this is that it isn't compatible with existing
decoders. People are going to want to reuse existing decoders to get to the
compressed content.

EBCDIC.  The 64 six-bit strings 000000 through 111111 are represented
by the characters "+", "-", "0" to "9", "A" to "Z", and "a" to "z".

This is the wrong base64 alphabet.  Use the same one that MIME uses, or go
for something more compact like base 85 (which makes sense if one is
trying to improve compression).


Agreed -- this is another total nonstarter. We already have far too many of
these alphabets floating around. If you want a 3 in 4, use base64 and give up
on the filename and related stuff. If you want a 4 in 5, use one of the
existing ones like BTOA. BTOA has the unforunate problem of having at least one
variant with a filename at the top. But there is a BTOA variant that doesn't,
so that's the one I'd use.

                                Ned