Re: 2 MIME questions re: message/rfc822


On Fri November 5 2004 13:28, ned+ietf-822(_at_)mrochek(_dot_)com wrote:

What's really needed is a generic way of computing a hash of a MIME object 
that
takes as many of these issues as possible into account. I've had the
specification of such a thing on my to-do list literally for years but I never
seem to find the time to finish writing it up.

Basically what you want to do is define a hash methodology that computes
separate hashes on leaf nodes in the MIME object and then combines those
separate hashes along with hashes of canonicalized headers and the MIME
structure itself in a specific way to arrive at a single result. The
advantages of this approach are numerous:


1. The "combines" part is likely to pose some problems because there
    are some conflicting characteristics which one would like to have:
    * it should be easy to combine and separate the hashes
    * the result should be sensitive to reordering of objects
    * it should be possible to determine if some object and its
       corresponding hash has been removed (or if some content
       has been added)
    * for large objects, a hash represents data reduction; as the
       ratio of object size to hash length increases, there is a
       reduction in sensitivity (ability to detect changes). The act
       of combining hashes should not result in additional decrease
       in sensitivity
    * for small objects, a hash may be larger than the object itself;
       as a message is split into a greater number of smaller objects
       which are individually hashed, the total size of the hashes
       grows, resulting in undesirable increased overhead
2. Canonicalization of headers presents several problems:
    a) it presupposes that the syntax and structure of each field is known,
        which can become a snag as new fields are defined, when user-
        defined or experimental fields are used, as new protocol element
        keywords (charset names, media types and subtypes, etc.) are
        registered, etc.   A similar problem exists for RFC 2047 encoded-
        words; because they can appear only in certain contexts (some
        comments, in phrases, and in unstructured fields), one needs to
        know detailed field syntax  to determine if something which
        looks like an encoded-word is in fact an encoded-word.
   b)  Unfolding, normalization and compression of whitespace are
        probably reasonable for structured fields, but differences in line
        folding, tabs vs. spaces, and quantity of whitespace characters
       may be significant in some instances in unstructured fields
        (Subject, Comments, Content-Description, etc.).
   c) Some objects are case-sensitive, others case-insensitive; that
       should be taken into account during canonicalization. However,
       in some instances is is simply not possible to determine whether
       some field text is a case-insensitive object or not.

(1) Encodings can be changed without breaking signatures. (This can help
    with handling whitespace, and it makes it possible for signatures to
    survive 8->7 conversion.)


Maybe. Change in encoding should result in change of
Content-Transfer-Encoding fields.  How would one handle that
change to the MIME-part header associated with the encoded
part, bearing in mind that once an object has been encoded, there
is no record of whether it was encoded from an original specified
as 8bit or as binary (which might or might not have been mislabeled)?

(2) Boundary markers can be changed without breaking signatures. (How
    to handle preamble and postamble text is an interesting side issue here.)


Would changing boundary markers not also change the MIME-part
header (boundary parameter in Content-Type field)?

[...]

So, is it time for me to finish the specification for this?


I suspect that 2a above might be a show-stopper.

Does anybody 
care, and more to the point, will anybody actually implement it?


And if so, will there be multiple interoperable implementations (which
continue to interoperate as new fields, charsets, media types and
subtypes, etc. develop)?   Would such a method interoperate with
current methods?  How would the different method be indicated:
MIME-Version: 2.0?  Won't there still be problems if the original
isn't in canonical form when signed (and won't more complex
rules add to that problem)?  Won't there still be problems with
non-MIME-aware message handlers and with legacy MIME
implementations?

It's an interesting idea, but
a) I'm not convinced that all of the supposed benefits are real
b) I'm not convinced that it's practical w.r.t. continuing evolution
    of header fields and field components
c) It's unclear whether it's necessary (i.e. whether it will address
    the situations where the existing signature mechanisms fail)
and
d) I suspect that additional complexity will compound the
    problem.