Re: Is 8BIT ESTMP really needed

In <200105080023(_dot_)UAA24919(_at_)astro(_dot_)cs(_dot_)utk(_dot_)edu> Keith 
Moore <moore(_at_)cs(_dot_)utk(_dot_)edu> writes:

AFAIK the only reason upconversion would ever break content-md5

my guess is that it breaks content-md5 for the case where 
bare CR or bare LF appears in the canonical form, and this 
same character is coincidentally the line ending for the stored 
form of the message at some point following upconversion.  the MD5 
computation at the recipient would see the line ending as CR LF
rather than just bare CR or bare LF as in the original.


Yes, that is the sort of case I had in mind.

Suppose your message has Content-Type: application/foo (there is never any
problem with text/plain). The body of the message is
        fooCRbarLFbazCRLF
(the naked CR and LF are a 'feature' os the application/foo protocol).
Suppose you decide to encode it in QP (maybe not the wisest choice, but
legal).

So you compute the Content-MD5 on the message exactly as above, and what
you send over the wire is:
        foo=0Dbar=0AbazCRLF
or      foo=0Dbar=0Abaz=0D=0A=CRLF
which, if it arrives at the far end intact, should verify correctly.

But if some intermediate site upconverts it (to binary, perhaps) and some
later site downconverts it again, or (being a UNIX site, changes the CR to
LF, and then changes both the LFs back to CRLF for MD5-checking) then
things can go wrong.

I think we established, a few weeks back, that the RFC defining
Content-MD5 should really have made it clearer exactly when LF->CRLF
canonicalization had to be done. But in the absence of such guidance, I
think the following is the correct (or at least the safest) way to check a
Content-MD5 on arrival.

1. If the Content-Type is text/*, undo any encoding, convert any naked LF
to CRLF (though there shouldn't be any) and protest loudly at any naked
CR. Then check the MD5.

2. If the Content-Type is other than text/*, look at the
Content-Transfer-Encoding.

2a. If it is 7bit, 8bit or binary, do the MD5 check as received.

2b. If it is base64, undo the encoding, and do the MD5 check on the result
(naked CR and LF remain as such).

2c. If it is QP, ensure that all line endings are in the form CRLF (just
in case you were on a UNIX box that had turned them into LF). Then undo
the encoding (which may expose some naked CR and LF) and then do the
check.

The last case is the tricky one. You may find:
        Naked CR that had been encoded as =0D
        Naked LF that had been encoded as =0A
        CRLF that had been encoded as =0D=0A
        CRLF that had not been encoded at all
        "nothing" that had been encoded as =CRLF

Note: One assumes that the person who sent the message and had chosen
(perhaps foolishly) to use Q-P has sent it in a form suited to the above.

But upconversion at an intermediate site might well break it (or, again,
it might not).

-- 
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133   Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk      Snail: 5 
Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9      Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5