pem-dev
[Top] [All Lists]

Re: Canonical forms (Was: Re: PEM/MIME Encryption)

1994-12-20 23:55:00
To say that there is a lack of concensus on what I proposed would be an mild
understatement :-).  However, at the risk of flogging the moribund equine: 

Ned Freed says:

Therefore, the appropriate sequence of steps should be:

1. Canonicalize the text. (This isn't an entirely straight-forward process,
even for simple RFC822 messages, d as I will discuss below.)

2. Digitally sign the canonical text. Ideally, a WYSIWYG editor would
display
the text in the canonical form for approval before the signature is applied.

3. Compress the text, using Lempel-Ziv or some other highly efficient
algorithm.

The order of steps 2 and 3 isn't especially important. You end up using both
in practice with MIME/PEM.

I have to disagree. If you sign the message after it has been compressed, then
you are depending on the trustworthiness of the compression code. I have seen
too many different flavors and versions of PKZIP and others to do so blindly.
It is bad enough that we have to rely on untusted editors running on untrusted
operating systems (generally). Let's not deliberately make matters worse by
dragging in even more untrusted code.

Amanda Walker writes:

I think that, while your concerns are valid, storage efficiency is a tertiary 
criterion at best.  Communications channels continue to grow in speed, as do
mass storage capacities.  Disk space, for example, is currently proceeding to 
drop below fifty cents a megabyte (I just bought a 527MB drive for $249 plus 
tax).  In this context, I think that the 33% expansion brought about by radix 
64 encoding is a relatively small cost.  I do not think that mandating 
compression (especially with algorithms that may be patented) is worth the 
additional layers of complexity, and additional time required to process 
messages.

I'm not terribly concerned about the radix 64 encoding, although I hate to be
forever chained to the lowest common denominator of transmission protocols. I
am am much more concerned about the factor of 2 to 4or more that is the usual
improvement in efficiency from text compression. It seems rather cavalier to
simply wave the back of our hand to such issues.

Also, remember that text is not the only content type we want to be able to 
sign and/or encrypt.  Keeping the cryptographic operations separate from the 
content-related operations is a good thing, in my opinion.

Well, I do buy that argument.

Every time that I have raised this issue, I have been told that that is no
problem, that MIME can handle compression. So now I'll ask it again -- DOES
it,
within the current PEM/MIME spec, and if so, how?

MIME/PEM doesn't deal with this, nor is it MIME/PEM's job to do so. The issue
of compression is purely a MIME issue -- you want to be able to compress
message content regardless of whether or not you sign or encrypt it or both.

Now I have somewhat mixed feelings. On the one hand, it seems to me that if you
transfer control from a MIME agent to a PEM agent for canonicalization and
signature, and then back to MIME for compression, and then back to PEM again
for encryption, and then either PEM or MIME for expansion to a 7 bit format,
then this seems terribly complex. On the other hand, I buy Amada's argument
that the cryptographic operations should be insensitive to the content type.

Ned continues:

As such, MIME/PEM *only* responsibility here is to make sure that use of
security services doesn't break the ability to compress things. And MIME/PEM
has in fact been designed so that it doesn't. (This is not much of a feat, in
that I don't see how MIME/PEM could have compromised this aspect of MIME even
had it wanted to.)

Now, as for MIME facilities for compression. First of all, you very breezily
say stuff like, "a highly efficient algorithm like Lempel-Ziv". There is no
such thing given the immense variety of types of data MIME deals with. As one
of the conclusions we've come to as part of the MIME work is that there's just
no way to separate compression from the type of data you're working with.

This argues for including the compression the data type, and this is in fact
what has been done. Types likes image/tiff, video/mpeg, and
application/postscript include their own compression, and it often offers 10
to
100-fold improvement, whereas you'll actually get data size growth with
something like Lempel-Ziv on some of this stuff.

OK, I'll buy that too. Certainly video has both an X-Y and a time component
that must be considered -- the difference from pixel to pixel is likely to be
small in all three directions, and a general purpose compression algorithm
would be hard pressed to determine that kind of structure.

Unfortunately this work hasn't extended to text yet. There are several
problems in this area:

(1) Should it be done with a content type or with an encoding? I prefer using
   an encoding since there are lots of "text-like" objects around.

My only concern, assuming that you agree with my argument about limiting the
amount of trusted code that we have to deal with, is that if we go from a pure
MIME to a indeterminate responsibility?) canonicalization to PEM for digital
signature and then then back to MIME for content-dependent compression, prior
to going back to PEM for optional encryption, followed by base 64 expansion if
necessary, is that we have the necessary labels firmly attached to the object
so that we don't lose track of what should be done with them. From that
standpoint, it seems to me that a content type would be more suitable than an
encoding, if I understand what you are saying. Isn't the output of an encoding
just a buch of undifferentiated bits?

(2) Patent issues. This effectively means that gzip is probably the most
   viable.

This seems like another case of lowest common denominator. Sure, I always like
to have things for free, but someone invents a better mousetrap and I have a
lot of mice to catch, I'm willing to pay him for his idea (just as sooner or
later, we all work for a living and would like to continue to get paid). A
patent isn't an issue unless or until the royalties become excessive, and so
long as there are other algorithms that may not be quite as good but are
cheaper, this kind of an extortionate monoply isn't likely to happen. 

(3) Specifications. Nobody has ever written a precise, detailed description of
   the algorithm suitable for publication as an RFC.

In short, all that's needed is a a little more work. But nobody has done it,
nor is it likely to get done until some of us get some other tasks off of the
to-do list. (There's a little hint here...)

Well, I'm going to cop out here as also, as I don't consider myself an expert
in compression but just want to see it applied. As long as we understand what
the issues are in passing objects back and forth across agent boundaries, I'm
happy enough.

With regard to the canonicalization problem, there is much more to
canonicalization than just the ASCII/EBCDIC and CR/LF issues. If the message
is
straight text, then I would really like to see that all nonprintable /
printer-transparent characters are eliminated. This would include, but is
not
limited to, the elimination of any backspace characters, and trailing blanks
before a CR, and any trailing blank lines before a page eject character.
This
would greatly simplify the revalidation of a digital signature by
re-scanning
the printed document, assuming a straight message format is used.

Sorry, these are NOT canonicalization issues. Spaces at the ends of lines can
be significant, as can backspaces and all sorts of other stuff. Tabs and
spaces aren't equivalent either.

Sorry. Although I have argued this in the past (with Steve Kent) and lost, I'll
argue it again. Whenever the output of a message is reduced to a human visible
form (on a screen or printer), then unless extraordinary steps are taken such
as a hexadecimal dump, nonprinting characters, including trailing spaces, are
invisible. This makes it all too easy to mount a birthday problem attack
against the message itself, where innocuous chages are made to the invisible
portions of the message until a version of the message is found that produces
an identical message digest to another message that is unfavorable to the
sender, but cannot be distinguished by the signature. If people are seriously
concerned about the possibility of a 56-bit DES key being broken by exhaustive
search, then the 2**64 iterations necessary to have roughly a 50% probability
of finding a message-digest equivalent message without detection is also
possible (assuming a 128-bit message digest). And if a successful attack
against only 0.01% of the messages examined could reap substantial rewards, as
it might in electronic commerce, then the risk is much greater. (I agree with
tabs vs. spaces, by the way -- there are left justified tabs, right justified
tabs, decimal-point adjusted tabs, etc. But that brings up even more subtle
issues about tabular data in general. Things were simpler when messages were
printed on one-dimensional ticker tape.)

[Discussion of canonicalization of complex objects deleted.]

We have stopped and thought about this, in considerable detail, for several
years now. I cannot help the fact that it wasn't done on this list so you
could see it, however.

OK, I'll take your word for it. Like everyone else, I can't possibly read
everything, much less be an expert on it.  (If it breaks, however, you'll owe
me an I told you so! ;-)

Now you're dragging out the old issue of what a signature means. Please stop!

OK. But if a signature doesn't mean anything, why are we trying so hard to
affix them to messages?


Bob


--------------------------------
Robert R. Jueneman
Staff Scientist
Wireless and Secure Systems Laboratory
GTE Laboratories
40 Sylvan Road
Waltham, MA 02254
Internet: Jueneman(_at_)gte(_dot_)com
FAX: 1-617-466-2603 
Voice: 1-617-466-2820


<Prev in Thread] Current Thread [Next in Thread>