On Sat November 6 2004 20:37, ned+ietf-822(_at_)mrochek(_dot_)com wrote:
* it should be easy to combine and separate the hashes
Only hash combination is needed, not separation.
That depends on the goals; if signature validity checking is
to be limited to something/nothing changed, then you're
right. However if one wants to be able to say these
specific N objects are unchanged; those specific M objects
have been altered, then the receiver will need to have access
to the individual hashes.
And if one is going to settle for a binary result, then it's
not clear that separate hashes are necessary in the first
place (some suitable canonicalization should suffice).
Separate hashes add effectively no cost to the scheme, and allow precomputation
and caching of hash values for large stored objects.
* it should be possible to determine if some object and its
corresponding hash has been removed (or if some content
has been added)
This one is next to impossible to do IMO, and falls far outside
the goals I would have to such a mechanism.
If some part of a multipart/mixed or multipart/related
message is elided, the meaning of the message as a whole
is altered, and should be detectable by the receiver. Now
hashing hashes or a single hash with canonicalization are
methods that are not going to present a problem here, but
other methods (e.g. simple concatenation of hashes)
Being able to detect modification is the goal. Being able to determine what the
modification was is a nongoal.
I'm well aware that in general there is no way to do header canonicalization
and hashing perfectly. The question is how far is far enough.
Or from another perspective: how far is too far. I suspect
that much beyond unfolding and squeezing of linear
whitespace (after unfolding) would be going too far. One
*might* uniformly up-case or down-case all alphabetics
to try to deal with case changes of protocol elements
such as field name tags during field rewriting, but it risks
losing sensitivity to significant case changes in header
text (in the RFC 2277 sense of "text").
I pretty much agree with this assessment, modulo issues with
specific fields like CT and CTE.
Would changing boundary markers not also change the MIME-part
header (boundary parameter in Content-Type field)?
Content-type is the one field that simply must be canonicalized in a way
that avoids this problem.
That implies not including the boundary parameter (whether
a simple parameter, or fragmented per RFC 2231) [probably
also the semicolon preceding the attribute name] in the
hash computation (before squeezing linear whitespace).
Correct. RFC 2231 is at least part of the reason for this, although
I tend to think worrying about RFC 2231 encoding of boundary parameters