Re: checksums on open issues list

James M Galvin writes:

My oversight (and I assert your's also) was in not realizing that XXXX
should not assume hetergeneous environments.  Indeed, there is no reason one
user should not be able to send data in a "native form" to some other user.
Of course, this assumes the "native form" makes sense to both, but even if
it does not, forcing a canonical encoding to verify the checksum is working
for nothing.  Of course, it may be necessary to use a canonical encoding to
get the "native form" past various "broken" gateways, but this is a
different problem.


Whether or not you believe that native formats will be used within local
enclaves (and here I refer to 8-bit text forms as well as pure binary) we have
been under considerable pressure to keep RFC-XXXX neutral with respect to what
encodings are available as well as what encodings are used where. This has
required considerable work in some cases, but I still believe it is worth it,
if for no other reason than to keep the consensus behind RFC-XXXX in place.

I have also been one of the people who keep harping on the broken gateways out
there that cause problems in the handling of various native forms that are
quite a bit more restrictive than binary or 8-bit. Thus, it may seem
contradictory for me to endorse the availability of a strong integrity check of
any encoding while also endorsing message structures that are
gateway-resistant.

There is no contradiction here. To the extent that broken gateways exist, one
must be prepared to deal with them. This should include detection of problems
introduced by these gateways. And to the extent that gateways are not broken,
it should be possible to avoid excess work. If this means enhancing things to
operate one way within an enclave and another outside, so be it.

I believe we are all in agreement that an end-to-end service is the most
desirable, and therefore I agree with Dave that the checksum should not be
part of the transfer encoding.  I am neutral with respect to whether it
should be part of the content type or in a separate header.


I agree completely here. I don't care if a separate header is involved or not.
Frankly, my motivation for using an existing header was to avoid the "yet
another header" problem, which seems to bother some people. It has never
bothered me, but piling things on a single header is something I can live with
too.

Another important point is that if the checksum is applied to the "native
form", as a message crosses an "aware" gateway, it will be possible to both
transform the message into a "new native form" and to recompute the
checksum, which I believe to be a valuable service.


I agree completely -- in fact, a gateway is probably the entity that needs the
checksum the most. As a user, if I get a random bodypart in some random
encoding and an error has crept in, I run the application, it bombs, and I say
"gee, it don't work". In this case the checksum tells me the most likely reason
why "it don't work" -- there's an error in the data,  but that's about it.

On the other hand, consider the problem a gateway faces. For example, say I
want to convert CDA (a particular compound document format used on VMS and
ULTRIX systems, among others) to ODA (this is what the X.400 world would prefer
to see). The converter I have available for this stuff tends to core dump on
flagrantly corrupt input streams. Now, a gateway should be able to handle a
converter core dump. But there's no particular need to handle it extremely
gracefully unless it happens all the time. I would like to avoid most core
dumps before they happen. An integrity check gives me just the sort of
information I need.

Oh, incidentally, CDA documents can be represented as short lines of plain text
with no trailing spaces. In general such a document should be able to pass
through most 7-bit paths I know of.

One final point that I have expressed to Ned and Neil privately.  "Checksum"
is the wrong term.  The service we are describing is a data integrity
service, more precisely a message integrity check or MIC.  A checksum is one
possible mechanism by which this service could be realized.  Another
mechanism is a hash algorithm.


Noted. The terminology is not my strong point, obviously.

                                Ned