a checksum summary?

I have been watching the messages the past few days and noting references to
checksums and I would like to try to create a summary of what I believe to
be the important elements.  This, of course, represents my own opinion.

Included in the comments I make below, I state the service objective and a
suggest a simple scheme for the inclusion of a checksum.

1. I mentioned this at the end of one of my earlier notes but it is
   important enough that I will mention it first this time.  The objective
   is to support a data integrity service in 822.  I believe this is an
   important and useful service, independent of support for a similar
   service that may or may not appear in SMTP (see below for more on this).
   One mechanism by which one realizes a data integrity service is a
   checksum.  However, it has also been suggested that MD4 or MD5 could be
   used.  These algorithms are not checksums, they are hash algorithms.
   Thus, we should not be talking about checksums, but a message integrity
   check (or MIC).  This is the term that generically refers to both kinds
   of values.

2. I would not recommend depending on a MIC service in SMTP.

   a. Although the specification may exist in the short term, the deployment
      of this service can never compare to the deployment of an 822
      extensions user agent.

   b. SMTP is point-to-point and limited in its scope.  822 mail extends
      beyond the Internet, that is beyond SMTP.  Providing the service in
      822 guarantees an end-to-end service, which would not be possible in
      SMTP.

   c. There is potential for a non-trivial amount of processing to occur
      between the receipt of a message by a user agent and its receipt by
      the local MTA.  While one normally does not expect that a message will
      be altered in the local environment, random disk errors are not
      unheard of.  Supporting the service in 822, more precisely in the user
      agent, provides some additional assurance at a relatively low cost.

3. A MIC calculation is completely independent of the content-transfer
   encoding and the content-type.  The service objective is ensure that the
   content-type received is the content-type that was sent.

   In part, the reason the content-transfer-encoding exists is to support
   this service.  An originator may know in advance to choose a lowest
   common denominator representation (e.g., base64) of the message to ensure
   the integrity of the message, or a gateway can alter a message's
   representation based on its knowledge of the capabilities of the
   receiving network.

   A MIC can be calculated on any content-type and verified for any
   content-type.  A *critical* issue is whether or not the content-type can
   be represented in the same form in both the originator's and recipient's
   environment.  This issue exists irrespective of the existence of a data
   integrity service.  An originator always presumes a recipient can
   "receive" the message being sent.  Thus, compute the MIC on the message
   in its native form and send it to the recipient.  The recipient can
   verify the MIC and then decide what to do with the message.

   I agree that text is a special case, since the conversion from ASCII to
   EBCDIC and back is essentially broken in the general case.  I sense there
   is not much concern about this issue in particular.  Rather, folks are
   more concerned with a MIC on content-types other than TEXT.  With this in
   mind, I suggest we do not resolve this issue and note that PEM solves
   this problem.

4. A good question to ask is when should the MIC be calculated.  I suggest
   during origination the calculation be done after any processing
   associated with the content-type header and just prior to the
   content-transfer-encoding processing.  Upon receipt the reverse applies:
   after the content-transfer-encoding processing but prior to the
   content-transfer-encoding processing.

5. Where should the MIC appear?  I believe there are two choices: as an
   attribute of the content-type header or as a separate header, e.g.,
   Content-MIC-Value.  Keep in mind there are two critical pieces of
   information: the MIC algorithm used and the MIC value.  You will probably
   want to recommend one algorithm, for interoperability reasons, but you
   need not register algorithm object identifiers.  This is already done by
   the revision to RFC 1115, which in fact references the all the definitive
   specifications for the algorithms.  There are other places you can point
   for algorithm IDs, also.

Jim