Canonical forms (Was: Re: PEM/MIME Encryption)

If I could add my two cents worth to this discussion --

I am concerned about two things -- the efficiency of transmission of PEM and/or
PEM/MIME messages over high-speed modems, and the understanding the semantic
content of complex objects that are signed and then validated in a different
environment than they were created in.

The new breed of modems that are available can achieve substantial improvement
in the effective transmission rate through the use of compression, often 50 to
100kbps using a 28kbps modem. The popular disk storage compression routines
provide a similar savings.However, the unintelligent use of encryption will
prevent any compression from occuring, and in fact a moderate degree of message
expansion may occur. As a result, encrypted PEM or PEM/MIME messages will incur
a substantial performance penalty in transmission and storage..

Therefore, the appropriate sequence of steps should be:

1. Canonicalize the text. (This isn't an entirely straight-forward process,
even for simple RFC822 messages, d as I will discuss below.)

2. Digitally sign the canonical text. Ideally, a WYSIWYG editor would display
the text in the canonical form for approval before the signature is applied.

3. Compress the text, using Lempel-Ziv or some other highly efficient
algorithm.

4. Encrypt the output, with 8 bits in and 8 bits out.

5. Expand from 8-bit to a 7-bit or other code as required for transmission. (My
preference would be to have the definition of the encapsulated object that is
being transmitted stop at the encryption boundary, and have the
mail/transmission system cope with the vaguries of 7-bit, 6-bit, or tom-tom
encoding. I hate to see a lowest common denominator perpetuated in an
encapsulated object definition.)

Every time that I have raised this issue, I have been told that that is no
problem, that MIME can handle compression. So now I'll ask it again -- DOES it,
within the current PEM/MIME spec, and if so, how?

With regard to the canonicalization problem, there is much more to
canonicalization than just the ASCII/EBCDIC and CR/LF issues. If the message is
straight text, then I would really like to see that all nonprintable /
printer-transparent characters are eliminated. This would include, but is not
limited to, the elimination of any backspace characters, and trailing blanks
before a CR, and any trailing blank lines before a page eject character. This
would greatly simplify the revalidation of a digital signature by re-scanning
the printed document, assuming a straight message format is used.

Once we get outside the straight RFC822 environment, then the issue of foreign
alphabets arises. Even within the Roman alphabets, how do we handle the
relatively simple case of umlauts, s-zet, c-cadilla, n-tildes, etc? In many
cases, the way that this is handled is by defining nonspacing characters which
PRECEDE the characters that they modify. Needless to say, this plays hob with
sorting and searching. Once you start addressing non-Roman alphabets, the
problem becomes much worse, with 16-bit codes having to be used in many cases.

More complex objects have their own special set of problems. For example, what
is the canonical form of a PostScript file? Are the definitions of all of the
fonts supposed to be included? If so, Adobe and others may sue for copyright
violations of their fonts. If not, then how does the PostScript interpretor
know how to translate a given code into the corresponding glyph? Over the last
several years there has been a reduction in the variability of font encodings,
but a number of variations still remain, especially with the so-called expert
fonts. If you don't think this is important, consider substituting the yen sign
for the dollar sign in your next paycheck.

As if the problem of fonts weren't enough, what about the header files that can
be downloaded in advance? Windows uses one type, the Mac uses another. and
remember that the PostScript language is a quite powerful PROGRAMMING LANGUAGE,
and in many cases it has access to your screen, the hard disk in your computer,
and certainly the memory and hard disk (if any) of your printer.( I haven't yet
seen a PostScript virus, but I expect to see one any day.) From the standpoint
of nonrepudiation (and why else would we be using a digital signature?), if the
complete header and all of the fonts are not included in the PostScript file,
the results are indeterminate. Surely the minimum that canonicalization should
do is assure that the results are completely determinate, but it beats me how
to accomplish this in general. Maybe we should transform them into Acrobat
before signing them.

I'm not familiar with the internal workings of JPEG, MPEG, and GIF files, nor
with the various sound files, especially MIDI files that are being used these
days. But I suspect that those approaches share some close similarities to
PostScript files, and may have the same set of problems.

Assuming that the purpose of canonicalization is to ensure that the same
results, i.e., the same semantic content, will be implied by a signature across
various platforms, then I think we have to at least stop and think a bit about
what the semantic content of a complex MIME object really IS, and what we are
implying when we sign it.

Suppose that I sign a JPEG-encoded photo. What does that mean? Is it a picture
of me? Did I take the picture? Is the picture a faithful representation of some
real-wold object? All anyone really know is that the encoded photo hasn't been
modified since I ran it through my fingers (metaphorically). Of course, if I
add some explanatory text and bind it to the object by signing the complex
object that will help, but now, presumably, we have to canonicalize the complex
object as a whole. 

In summary, I am very concerned that we understand the implications of signing
a bucket of bits. I'm confident that the PEM/MIME spec does a reasonably good
job of describing the syntax of these complex objects. I have much less
confidence that we have a good handle on the semantics.

Bob
--------------------------------
Robert R. Jueneman
Staff Scientist
Wireless and Secure Systems Laboratory
GTE Laboratories
40 Sylvan Road
Waltham, MA 02254
Internet: Jueneman(_at_)gte(_dot_)com
FAX: 1-617-466-2603 
Voice: 1-617-466-2820

<Prev in Thread]	Current Thread	[Next in Thread>
Canonical forms (Was: Re: PEM/MIME Encryption), Jueneman <= Re: Canonical forms (Was: Re: PEM/MIME Encryption), Ned Freed Re: Canonical forms (Was: Re: PEM/MIME Encryption), Mr Rhys Weatherley Re: Canonical forms (Was: Re: PEM/MIME Encryption), Amanda Walker Re: Canonical forms (Was: Re: PEM/MIME Encryption), Jueneman Re: Canonical forms (Was: Re: PEM/MIME Encryption), Ned Freed Re: Canonical forms (Was: Re: PEM/MIME Encryption), Amanda Walker Re: Canonical forms (Was: Re: PEM/MIME Encryption), Jueneman Re: Canonical forms (Was: Re: PEM/MIME Encryption), Amanda Walker Re: Canonical forms (Was: Re: PEM/MIME Encryption), Jueneman Re: Canonical forms (Was: Re: PEM/MIME Encryption), Amanda Walker

Previous by Date:	Re: X.509 v3 Certificate, warwick (w.s.) ford
Next by Date:	Re: Public key identifier, Jueneman
Previous by Thread:	I would like to join your mailing list, Barry Miracle
Next by Thread:	Re: Canonical forms (Was: Re: PEM/MIME Encryption), Ned Freed
Indexes:	[Date] [Thread] [Top] [All Lists]