I've asked Gunther Schadow, one of the authors of draft-ietf-ediint-hl7.txt
(covering secure EDI transactions for HL7, which is used for healthcare), for
his thoughts on compressing messages and on adding compression to S/MIME.
Note that HL7 use has particular requirements (small messages of a few K to a
few hundred K, nothing like the multimegabyte FEDI monsters, a generally text-
only environment (no easy ability to handle binary data), batch-style
processing, etc etc) which are reflected in the EDIINT-HL7 draft and the
comments below. One thing I'll have to add to the compression draft is a
comment on the relative merits of compress-before-sign vs sign-before-
compress.
Some of the EDIINT drafts have the following (or some variant thereof) to say
about S/MIME compression:
The digital envelope does not perform data compression prior to encryption.
There is yet no standard way to compress MIME-EDI entities before encryption,
however, the EDIINT working group suggests using a header field named
Content-Encoding as per HTTP [RFC 2068].
The recommendation for compression is either (1) use PGP, (2) use the above,
piping the data through compress or gzip (it works in this case because it's
batch processing typically done on Unix boxes). I'm only aware of one
situation where this type of processing is being used, although they're not
following the above recommendation but just assuming the other side knows
it's compressed (you don't want to know what sort of a hack this involves
under Windows now that sites are moving towards using NT for this - it was
an obvious and easy solution under Unix, but almost impossible to move to
Windows). The more general solution is to use zlib and assume that the other
side knows that id-data is sometimes compressed data (it works in this case
because EDI messages are always fixed-format text while compressed data is
binary). Having been exposed to these kludges is one of the reasons why I
really want to see a standard representation for compressed content in
S/MIME.
I saw the first EDIINT draft some time in 1996 (although I don't know what it
had to say about compression back then), so they may have been waiting for
over three years, beating the 14-month record by a considerable margin.
Peter.
-- Snip --
Hi,
Interestingly PGP has had automatic compression of payload from the
beginning. The reason is not just size of the message itself, but
other aspects may make compression before encryption favorable:
1. removing redundancy improves robustness against certain (or all?)
decipherment attacks.
2. the payload to encrypt is smaller which is why compression before
encryption can be more efficient than after encryption. I tested it
on a 300 kB text file that compresses to 16% of the size and the
pipe gzip|bdes ran still 20% faster than bdes only.
3. since encryption increases entropy of the message, compression
is less efficient after encryption. Actually it's not possible
at all: gzipping a DES-CBC cryptogram will actually increase
the size!
This is all said for encryption. For signatures it may be better not
to compress so as to have an immediately readable cleartext to store
with the signature. But hell, isn't S/MIME hiding the text behind a
blob of DER anyway? (I hate S/MIME for this)
Notably hashing (MD5) is far less expensive then DES-CBC so that
compressing before signature would not gain any significant
throughput, and the cost of multiple decompression for every access to
the authenticated archive would be too high.
So, consider an ideal world where we were using MOSS (the real
SECURE-MIME, nice MIME-based plain text), the most useful steps would
be:
1. sign
2. compress
3. encrypt
Actually the little test I did has such a persuasive outcome that I'll
just show the figures here:
$ ls -l A*
-rw-r--r-- 1 schadow bin 335803 Jul 29 15:34 A plain text
-rw-r--r-- 1 schadow bin 335808 Jul 29 15:39 Ac DES-CBC cryptogram
-rw-r--r-- 1 schadow bin 335884 Jul 29 15:36 Acz compressed crypto
-rw-r--r-- 1 schadow bin 55786 Jul 29 15:35 Az compressed plain
-rw-r--r-- 1 schadow bin 55792 Jul 29 15:39 Azc crypted compress
$ time gzip -c A | bdes -k onekeytwokeyslongkeyshortkey > Azc
real 0m0.335s
user 0m0.289s
sys 0m1.030s
$ time bdes -k onekeytwokeyslongkeyshortkey <A > Ac
real 0m0.433s
user 0m0.334s
sys 0m0.032s
$ time bdes -k onekeytwokeyslongkeyshortkey <A |gzip -f >Acz
real 0m0.666s
user 0m1.606s
sys 0m0.019s
$ time gzip -f <A >Az
real 0m0.257s
user 0m0.222s
sys 0m0.016s
I am not sure how much this matters for EDI since as far as I know
compressing EDI streams is quite uncommon. But why would you not want
to do it if the additional compression step can speed up the entire
encryption process? I's a win/win situation: where else can you save
both channel bandwidth and CPU load?
regards
-Gunther
Gunther Schadow ----------------------------------- http://aurora.rg.iupui.edu
Regenstrief Institute for Health Care
1001 W 10th Street RG5, Indianapolis IN 46202, Phone: (317) 630 7960
schadow(_at_)aurora(_dot_)rg(_dot_)iupui(_dot_)edu ----------------------
#include <usual/disclaimer>