pem-dev
[Top] [All Lists]

Re: Canonical forms (Was: Re: PEM/MIME Encryption)

1994-12-21 12:50:00
To say that there is a lack of concensus on what I proposed would be an mild
understatement :-).

Actually, there seems to be a very strong consensus that most of what you said
was irrelevant in the context of MIME/PEM.

Ned Freed says:

Therefore, the appropriate sequence of steps should be:

1. Canonicalize the text. (This isn't an entirely straight-forward 
process,
even for simple RFC822 messages, d as I will discuss below.)

2. Digitally sign the canonical text. Ideally, a WYSIWYG editor would 
display
the text in the canonical form for approval before the signature is 
applied.

3. Compress the text, using Lempel-Ziv or some other highly efficient
algorithm.

The order of steps 2 and 3 isn't especially important. You end up using both
in practice with MIME/PEM.

I have to disagree. If you sign the message after it has been compressed, then
you are depending on the trustworthiness of the compression code. I have seen
too many different flavors and versions of PKZIP and others to do so blindly.
It is bad enough that we have to rely on untusted editors running on untrusted
operating systems (generally). Let's not deliberately make matters worse by
dragging in even more untrusted code.

You are still seeing compression in a hopelessly narrow context. In many cases
compression is part of the data format itself. There simply is no way you can
insert a signature into most compressed data formats. For one thing, many of
these formats are lossy and you do NOT recover the original data when you
decompress, so a signature covering the original data is meaningless.

This is not a service that requires a digital signature anyway, nor should a
signature be needed to obtain this service. All you need is an integrity check.
In the case of a content-transfer-encoding based  compression this is provided
by the content-md5 header. In the case of compression implicit in some data
format it has to be provided by the format itself -- there is no way to do it
externally.

Note that since the integrity check is inside the signed object it ends up
getting signed along with the rest of it.

People who have been working with, using, and implementing MIME for several
years have already become fairly adept at this sort of stuff.

Amanda Walker writes:

I think that, while your concerns are valid, storage efficiency is a
tertiary criterion at best.  Communications channels continue to grow in
speed, as do mass storage capacities.  Disk space, for example, is
currently proceeding to drop below fifty cents a megabyte (I just bought a
527MB drive for $249 plus tax).  In this context, I think that the 33%
expansion brought about by radix 64 encoding is a relatively small cost.
I do not think that mandating compression (especially with algorithms that
may be patented) is worth the additional layers of complexity, and
additional time required to process messages.

I'm not terribly concerned about the radix 64 encoding, although I hate to be
forever chained to the lowest common denominator of transmission protocols.

I actually see this as a really serious issue. Rest assured that had we found
any acceptable alternative we would have used it.

I am much more concerned about the factor of 2 to 4or more that is the usual
improvement in efficiency from text compression. It seems rather cavalier to
simply wave the back of our hand to such issues.

There is no handwaving here. These are real issues. They are not the most
important issues, but they are real issues nevertheless. However, it  is
already well understood how these issues need to be dealt with, and it is also
well understood that this is an entirely independent matter from MIME/PEM.

Can't we please use the divide and conquer approach here? Can't we please
dispose of the matters at hand so some of us can proceed to devote time to
things like compression, which we've had to ignore because we have real limits
on how many problems we can work on at the same time? The ONLY reason that a
text compression solution isn't out in draft already is that the finalization
of both MIME and MIME/PEM have prevented me from having the necessary time to
work on it.

Every time that I have raised this issue, I have been told that that is\
no problem, that MIME can handle compression. So now I'll ask it again --
DOES it, within the current PEM/MIME spec, and if so, how?

MIME/PEM doesn't deal with this, nor is it MIME/PEM's job to do so. The
issue of compression is purely a MIME issue -- you want to be able to
compress message content regardless of whether or not you sign or encrypt 
it or both.

Now I have somewhat mixed feelings. On the one hand, it seems to me that if
you transfer control from a MIME agent to a PEM agent for canonicalization
and signature, and then back to MIME for compression, and then back to PEM
again for encryption, and then either PEM or MIME for expansion to a 7 bit
format, then this seems terribly complex. On the other hand, I buy Amada's
argument that the cryptographic operations should be insensitive to the
content type.

This isn't how it works in practice. In practice the MIME agent prepares
an object, which can be:

(1) Composite or atomic.
(2) Uncompressed, compressed within the data format itself, or compressed
    using an appropriate content-transfer-encoding. It is even possible that
    some new way of putting compression in could be added in the future.
(3) Integrity checked under the compression, either using the MIME mechanisms
    for this or within the data format itself.

This object is then signed or encrypted or both, and the result is repackaged
inside of a new MIME wrapper.

The only real complexity here is that this can be done multiple times at
multiple levels. This does require transfer of control back and forth between
agents, but since in all cases you're just passing around objects in a common
format (i.e. MIME) there really isn't a big interfacing problem.

Ned continues:

As such, MIME/PEM *only* responsibility here is to make sure that use of
security services doesn't break the ability to compress things. And MIME/PEM
has in fact been designed so that it doesn't. (This is not much of a feat, 
in
that I don't see how MIME/PEM could have compromised this aspect of MIME 
even
had it wanted to.)

Now, as for MIME facilities for compression. First of all, you very breezily
say stuff like, "a highly efficient algorithm like Lempel-Ziv". There is no
such thing given the immense variety of types of data MIME deals with. As 
one
of the conclusions we've come to as part of the MIME work is that there's 
just
no way to separate compression from the type of data you're working with.

This argues for including the compression the data type, and this is in fact
what has been done. Types likes image/tiff, video/mpeg, and
application/postscript include their own compression, and it often offers 10
to 100-fold improvement, whereas you'll actually get data size growth with
something like Lempel-Ziv on some of this stuff.

OK, I'll buy that too. Certainly video has both an X-Y and a time component
that must be considered -- the difference from pixel to pixel is likely to be
small in all three directions, and a general purpose compression algorithm
would be hard pressed to determine that kind of structure.

This is just one trick out of many. Audio compression, for example, can involve
the use of adaptive filtering techniques, where you either store the difference
between the actual and predicted signal or else store the coefficients that
cause the filter to produce the desired output.

JPEG uses discrete cosine tranforms and then bandwidth-limits the
frequency-space result. And JPEG is nowhere near state-of-the-art -- the FBI,
for example, is using a wavelet transform technique for black and white
fingerprint data that gives them up to 50:1 compression without losing any
significant details. (Although smaller initially black and white is usually
harder to compress than color since there's much less redundancy in the data.)

Unfortunately this work hasn't extended to text yet. There are several
problems in this area:

(1) Should it be done with a content type or with an encoding? I prefer 
using
    an encoding since there are lots of "text-like" objects around.

My only concern, assuming that you agree with my argument about limiting the
amount of trusted code that we have to deal with, is that if we go from a pure
MIME to a indeterminate responsibility?) canonicalization to PEM for digital
signature and then then back to MIME for content-dependent compression, prior
to going back to PEM for optional encryption, followed by base 64 expansion if
necessary, is that we have the necessary labels firmly attached to the object
so that we don't lose track of what should be done with them. From that
standpoint, it seems to me that a content type would be more suitable than an
encoding, if I understand what you are saying. Isn't the output of an encoding
just a buch of undifferentiated bits?

This isn't how we use the word "encoding" in the MIME context. See RFC1521.

(2) Patent issues. This effectively means that gzip is probably the most
    viable.

This seems like another case of lowest common denominator. Sure, I always like
to have things for free, but someone invents a better mousetrap and I have a
lot of mice to catch, I'm willing to pay him for his idea (just as sooner or
later, we all work for a living and would like to continue to get paid). A
patent isn't an issue unless or until the royalties become excessive, and so
long as there are other algorithms that may not be quite as good but are
cheaper, this kind of an extortionate monoply isn't likely to happen.

Sigh. I should never have brought this up. It simply isn't relevant to the
current discussion. It is an issue in the work to define a compress
content-transfer-encoding, but only there and I think there's broad agreement
on how to proceeed.

(3) Specifications. Nobody has ever written a precise, detailed description 
of
    the algorithm suitable for publication as an RFC.

In short, all that's needed is a a little more work. But nobody has done it,
nor is it likely to get done until some of us get some other tasks off of 
the
to-do list. (There's a little hint here...)

Well, I'm going to cop out here as also, as I don't consider myself an expert
in compression but just want to see it applied. As long as we understand what
the issues are in passing objects back and forth across agent boundaries, I'm
happy enough.

This we do understand, believe me. We deal with it all the time in the
MIME world.

With regard to the canonicalization problem, there is much more to
canonicalization than just the ASCII/EBCDIC and CR/LF issues. If the
message is straight text, then I would really like to see that all
nonprintable / printer-transparent characters are eliminated. This would
include, but is not limited to, the elimination of any backspace
characters, and trailing blanks before a CR, and any trailing blank
lines before a page eject character.

This would greatly simplify the revalidation of a digital signature by
re-scanning the printed document, assuming a straight message format is
used.

Sorry, these are NOT canonicalization issues. Spaces at the ends of lines
can be significant, as can backspaces and all sorts of other stuff. Tabs
and spaces aren't equivalent either.

Sorry. Although I have argued this in the past (with Steve Kent) and lost, 
I'll
argue it again. Whenever the output of a message is reduced to a human visible
form (on a screen or printer), then unless extraordinary steps are taken such
as a hexadecimal dump, nonprinting characters, including trailing spaces, are
invisible. This makes it all too easy to mount a birthday problem attack
against the message itself, where innocuous chages are made to the invisible
portions of the message until a version of the message is found that produces
an identical message digest to another message that is unfavorable to the
sender, but cannot be distinguished by the signature. If people are seriously
concerned about the possibility of a 56-bit DES key being broken by exhaustive
search, then the 2**64 iterations necessary to have roughly a 50% probability
of finding a message-digest equivalent message without detection is also
possible (assuming a 128-bit message digest). And if a successful attack
against only 0.01% of the messages examined could reap substantial rewards, as
it might in electronic commerce, then the risk is much greater. (I agree with
tabs vs. spaces, by the way -- there are left justified tabs, right justified
tabs, decimal-point adjusted tabs, etc. But that brings up even more subtle
issues about tabular data in general. Things were simpler when messages were
printed on one-dimensional ticker tape.)

Amanda already responded to this, and I agree with her 100%.

                                Ned

<Prev in Thread] Current Thread [Next in Thread>