RE: Compressed data type for S/MIME

I've asked Gunther Schadow, one of the authors of draft-ietf-ediint-hl7.txt 
(covering secure EDI transactions for HL7, which is used for healthcare), for 
his thoughts on compressing messages and on adding compression to S/MIME.  
Note that HL7 use has particular requirements (small messages of a few K to a
few hundred K, nothing like the multimegabyte FEDI monsters, a generally text-
only environment (no easy ability to handle binary data), batch-style 
processing, etc etc) which are reflected in the EDIINT-HL7 draft and the 
comments below.  One thing I'll have to add to the compression draft is a 
comment on the relative merits of compress-before-sign vs sign-before-
compress.

Some of the EDIINT drafts have the following (or some variant thereof) to say 
about S/MIME compression:

The digital envelope does not perform data compression prior to encryption.  
There is yet no standard way to compress MIME-EDI entities before encryption, 
however, the EDIINT working group suggests using a header field named 
Content-Encoding as per HTTP [RFC 2068].


The recommendation for compression is either (1) use PGP, (2) use the above, 
piping the data through compress or gzip (it works in this case because it's 
batch processing typically done on Unix boxes).  I'm only aware of one 
situation where this type of processing is being used, although they're not 
following the above recommendation but just assuming the other side knows 
it's compressed (you don't want to know what sort of a hack this involves 
under Windows now that sites are moving towards using NT for this - it was 
an obvious and easy solution under Unix, but almost impossible to move to
Windows).  The more general solution is to use zlib and assume that the other
side knows that id-data is sometimes compressed data (it works in this case
because EDI messages are always fixed-format text while compressed data is
binary).  Having been exposed to these kludges is one of the reasons why I
really want to see a standard representation for compressed content in 
S/MIME.

I saw the first EDIINT draft some time in 1996 (although I don't know what it 
had to say about compression back then), so they may have been waiting for 
over three years, beating the 14-month record by a considerable margin.

Peter.

-- Snip --

Hi,

Interestingly PGP has had automatic compression of payload from the
beginning. The reason is not just size of the message itself, but
other aspects may make compression before encryption favorable:

1. removing redundancy improves robustness against certain (or all?)
   decipherment attacks.

2. the payload to encrypt is smaller which is why compression before
   encryption can be more efficient than after encryption. I tested it
   on a 300 kB text file that compresses to 16% of the size and the
   pipe gzip|bdes ran still 20% faster than bdes only.

3. since encryption increases entropy of the message, compression
   is less efficient after encryption. Actually it's not possible
   at all: gzipping a DES-CBC cryptogram will actually increase
   the size!

This is all said for encryption. For signatures it may be better not
to compress so as to have an immediately readable cleartext to store
with the signature. But hell, isn't S/MIME hiding the text behind a
blob of DER anyway? (I hate S/MIME for this)

Notably hashing (MD5) is far less expensive then DES-CBC so that
compressing before signature would not gain any significant
throughput, and the cost of multiple decompression for every access to
the authenticated archive would be too high.

So, consider an ideal world where we were using MOSS (the real
SECURE-MIME, nice MIME-based plain text), the most useful steps would
be:

1. sign
2. compress
3. encrypt

Actually the little test I did has such a persuasive outcome that I'll
just show the figures here:

$ ls -l A*

-rw-r--r--  1 schadow  bin  335803 Jul 29 15:34 A       plain text
-rw-r--r--  1 schadow  bin  335808 Jul 29 15:39 Ac      DES-CBC cryptogram
-rw-r--r--  1 schadow  bin  335884 Jul 29 15:36 Acz     compressed crypto
-rw-r--r--  1 schadow  bin   55786 Jul 29 15:35 Az      compressed plain
-rw-r--r--  1 schadow  bin   55792 Jul 29 15:39 Azc     crypted compress

$ time gzip -c A | bdes -k onekeytwokeyslongkeyshortkey > Azc

real    0m0.335s
user    0m0.289s
sys     0m1.030s

$ time bdes -k onekeytwokeyslongkeyshortkey <A > Ac

real    0m0.433s
user    0m0.334s
sys     0m0.032s

$ time bdes -k onekeytwokeyslongkeyshortkey <A |gzip -f >Acz

real    0m0.666s
user    0m1.606s
sys     0m0.019s

$ time gzip -f <A >Az

real    0m0.257s
user    0m0.222s
sys     0m0.016s

I am not sure how much this matters for EDI since as far as I know
compressing EDI streams is quite uncommon. But why would you not want
to do it if the additional compression step can speed up the entire
encryption process?  I's a win/win situation: where else can you save
both channel bandwidth and CPU load?

regards
-Gunther

Gunther Schadow ----------------------------------- http://aurora.rg.iupui.edu
Regenstrief Institute for Health Care
1001 W 10th Street RG5, Indianapolis IN 46202, Phone: (317) 630 7960
schadow(_at_)aurora(_dot_)rg(_dot_)iupui(_dot_)edu ---------------------- 
#include <usual/disclaimer>