ietf-822
[Top] [All Lists]

Content-MD5

2001-04-11 06:54:58
I have been trying to write a program to generate Content-MD5 headers
for Mime objects, and find some difficulty in interpreting RFC1864.

What it says there is that I am to compute the MD5 algorithm on "the
canonical form of the MIME entity's object", which means the form
before any Content-Transfer-Encoding (or after decoding same, if at the
receiving end). So far so good.

It then says:

"For textual data, this means the MD5 algorithm must be computed on data
in which the canonical form for newlines applies, that is, in which each
newline is represented by a CR-LF pair."

But what is textual data? Now I can see that Content-Type:
text/plain is textual, and doubtless text/html likewise. And
application/some-binary-executable us clearly not textual (and arbitrary
changes of CFLF to LF, or whatever the local notation demanded would be
disastrous).

But what about application/postscript? That is certainly readable as
text, and there is no special need to encode it as base64. Or image/fig
(don't know whether that is a recognised application type, but fig is
certainly a way of specifying images, and it comes as text). Or any
application/foo which the recipient might not understand, but would at
least like to check that the MD5 agrees?

So I did some experimentation with Sun's dtmail (which has known bugs in
its Content-MD, but at least seems to get it right for attachments). I
gave it a shell script: it decided it was text plain, and put the CRs in
before computing the MD5. I then constructed a postscript file (draws
a little red circle) which, of course, on my Solaris system has lines
terminated with LF only. It recognised that application/postscript was
needed, it computed the MD5 on the LF version, and then encoded it in
base64 (which seems a neat way to pass the problem onto someone else).
But is it correct? And what if I choose to leave the encoding at 7bit?
Or if I receive an application/postscript in 7bit and want to check the
MD5?

There are two attachments to this message. One is that postscript in
base64, and the other is exactly the same file without encoding (I
may have some difficulty is persuading my system to send it without
encoding, and even if I do some intermediate transport agent might try
to munge it). Both have the same MD5, computed on the LF only. Assuming
that they arrive at your end without further encoding changes, I would
be interested to see whether you all agree on whether the MD5 is correct
in both cases, and also what happens if you save them both to a file, or
even try to display them.

Cc to Ian Bell at Turnpike, where they take Content-MD5 seriously.

Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk      Snail: 5 
Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5

Attachment: red-base64.ps
Description: red-base64.ps

Attachment: red-7bit.ps
Description: red-7bit.ps

<Prev in Thread] Current Thread [Next in Thread>