On Fri November 5 2004 13:28, ned+ietf-822(_at_)mrochek(_dot_)com wrote:
What's really needed is a generic way of computing a hash of a MIME object
that
takes as many of these issues as possible into account. I've had the
specification of such a thing on my to-do list literally for years but I never
seem to find the time to finish writing it up.
Basically what you want to do is define a hash methodology that computes
separate hashes on leaf nodes in the MIME object and then combines those
separate hashes along with hashes of canonicalized headers and the MIME
structure itself in a specific way to arrive at a single result. The
advantages of this approach are numerous:
1. The "combines" part is likely to pose some problems because there
are some conflicting characteristics which one would like to have:
* it should be easy to combine and separate the hashes
* the result should be sensitive to reordering of objects
* it should be possible to determine if some object and its
corresponding hash has been removed (or if some content
has been added)
* for large objects, a hash represents data reduction; as the
ratio of object size to hash length increases, there is a
reduction in sensitivity (ability to detect changes). The act
of combining hashes should not result in additional decrease
in sensitivity
* for small objects, a hash may be larger than the object itself;
as a message is split into a greater number of smaller objects
which are individually hashed, the total size of the hashes
grows, resulting in undesirable increased overhead
2. Canonicalization of headers presents several problems:
a) it presupposes that the syntax and structure of each field is known,
which can become a snag as new fields are defined, when user-
defined or experimental fields are used, as new protocol element
keywords (charset names, media types and subtypes, etc.) are
registered, etc. A similar problem exists for RFC 2047 encoded-
words; because they can appear only in certain contexts (some
comments, in phrases, and in unstructured fields), one needs to
know detailed field syntax to determine if something which
looks like an encoded-word is in fact an encoded-word.
b) Unfolding, normalization and compression of whitespace are
probably reasonable for structured fields, but differences in line
folding, tabs vs. spaces, and quantity of whitespace characters
may be significant in some instances in unstructured fields
(Subject, Comments, Content-Description, etc.).
c) Some objects are case-sensitive, others case-insensitive; that
should be taken into account during canonicalization. However,
in some instances is is simply not possible to determine whether
some field text is a case-insensitive object or not.
(1) Encodings can be changed without breaking signatures. (This can help
with handling whitespace, and it makes it possible for signatures to
survive 8->7 conversion.)
Maybe. Change in encoding should result in change of
Content-Transfer-Encoding fields. How would one handle that
change to the MIME-part header associated with the encoded
part, bearing in mind that once an object has been encoded, there
is no record of whether it was encoded from an original specified
as 8bit or as binary (which might or might not have been mislabeled)?
(2) Boundary markers can be changed without breaking signatures. (How
to handle preamble and postamble text is an interesting side issue here.)
Would changing boundary markers not also change the MIME-part
header (boundary parameter in Content-Type field)?
[...]
So, is it time for me to finish the specification for this?
I suspect that 2a above might be a show-stopper.
Does anybody
care, and more to the point, will anybody actually implement it?
And if so, will there be multiple interoperable implementations (which
continue to interoperate as new fields, charsets, media types and
subtypes, etc. develop)? Would such a method interoperate with
current methods? How would the different method be indicated:
MIME-Version: 2.0? Won't there still be problems if the original
isn't in canonical form when signed (and won't more complex
rules add to that problem)? Won't there still be problems with
non-MIME-aware message handlers and with legacy MIME
implementations?
It's an interesting idea, but
a) I'm not convinced that all of the supposed benefits are real
b) I'm not convinced that it's practical w.r.t. continuing evolution
of header fields and field components
c) It's unclear whether it's necessary (i.e. whether it will address
the situations where the existing signature mechanisms fail)
and
d) I suspect that additional complexity will compound the
problem.