Re: checksums

Reply-To: James M Galvin <galvin(_at_)tis(_dot_)com>
To: ietf-822(_at_)dimacs(_dot_)rutgers(_dot_)edu
Subject: checksums
Date: Wed, 30 Oct 91 16:02:51 -0500
From: James M Galvin <galvin(_at_)tis(_dot_)com>

I will backoff of insisting on a more flexible and general solution to the
checksum "requirement".  However, I do have three comments about adding a
checksum solution:

1.  The checksum must be based on a canonical form of the data.  One
    candidate canonical form is the base64 encoding.  In other words, it
    would be very bad to base the checksum on the underlying binary (or
    local) representation.  See RFC 1113 for the rationale.


While I agree that the checksum has to be based on a canonical form of the
data, I don't think the canonical form should not be the base64-encoded form
of the data.  I would prefer to compute the checksum on the data immediately
before encoding in base64 (or whatever). 

Reasons:  There's no obvious canonical form of a base64-encoded data stream
(since line lengths can vary).  So two identical streams of unencoded data
could have different checksums.  Second, defining the checksum in this way
discourages use of checksums with other content-transfer-encodings.

While it is true that for some kinds of data (e.g. "text" files) the local
form will frequently be different than the canonical form, frequently,
this will not be the case.

Part of the specification for any particular content type should be a
description of what the data looks like in canonical form, prior to
any content-transfer encoding.  Mail composers will have to translate
from local form to canonical form, when necessary, before encoding in
base64 or whatever; mail readers have to translate from canonical form
to local form after decoding.

Here's how I see the encoding/decoding process, along with checksum
generation, using base64 as an example:



          +----------------+   +----------+     +------------+
sender's  |   convert to   |   | compute  |---->| convert to |     base64
 local -->| canonical form |-->| checksum |-+   |   base64   |--> encoding
 form     +----------------+   +----------+ |   +------------+
                                            +-------------------> checksum

                           Encoding in base64



            +--------+     +----------------+   +------------+
 base64     | decode |---->| compute,verify |-->| convert to |--> recipient's
encoding -->| base64 |  +->|    checksum    |   | local form |    local form
            +--------+  |  +----------------+   +------------+
checksum ---------------+

                        Decoding base64 body parts



Is this inconsistent with either your expectations or RFC 1113?

-Keith