In <01K29L5Z88XK001NOM(_at_)mauve(_dot_)mrochek(_dot_)com>
ned(_dot_)freed(_at_)mrochek(_dot_)com writes:
But what is textual data?
The basic rule I use is anything under the top-level text type or anything
that's encoded using 7bit or 8bit. (The full set of rules I use is actually a
lot more complex, but this is to deal with operating systems that support more
complex file organizations that simple streams.)
OK, I see that makes sense. Presumably anything sent (or to be sent) Q-P
or Base64 is computed as it decodes (whether that turns out to be
canonical or not). Anything that is sent 8bit, and then encoded into
something different en route will presumably have been computed and sent
canonically, so the encoding will be on the canonical form and should be
checked as such. The only problem I can foresee is if some intermediate
agent decides to decode the base64, then canonicalises it before passing
it onwards. That might cause an MD5 falure at the ultimate destination,
but I cannot understand why an intermediate site should think of doing
such a thing, even if it could see Content-Type text/plain (but, you never
know :-( ).
I can also see a problem if my postscript with naked NLs is decoded from
base64 at the far end, and then immediately canonicalized (because that's
what the remote postscript interpreter expects). One needs to be sure that
the MD5 checking is done by the mail agents before that happens, and not
by any postscript agent. But that is as it should be.
However, all this ought really to have been spelled out explicitly in the
RFC, which certainly seems to be ambiguous as written, though your
interpretation would seem to be the only one which makes sense.
This is basically just common sense: Agents routinely mess with
the line terminators of text subtypes and things encoded as 7bit or
8bit, so these are the cases that need to be canonicalized.
Now I can see that Content-Type:
text/plain is textual, and doubtless text/html likewise. And
application/some-binary-executable us clearly not textual (and arbitrary
changes of CFLF to LF, or whatever the local notation demanded would be
disastrous).
But what about application/postscript? That is certainly readable as
text,
Hardly. In general Postscript is NOT text. It can contain arbitrary binary
sequences and even the parts that look to you like text can be sensitive to
what line terminators are used. (The format includes multiline byte counted
strings.) Unless PostScript is being carried around as 7bit or 8bit text you
have to treat it as binary.
Hmmm! All the postscript binary I have ever seen has been textualized in
hex or base64 or something, and provided with a postscript procedure to
read it in. I grant that one could write postscript that included pure
binary (and the means to read it in), but I have never seen any like that.
This is described in the MIME RFCs, BTW.
I saw nothing relevant in RFC2046 where application/postscript is
described. BTW, and off topic, there are dire warnings in there about nasty
side effects of blindly obeying postscript sent by mail. Is it the case
that anything sent as PDF rather than postscript is immune from that?
There are two attachments to this message. One is that postscript in
base64, and the other is exactly the same file without encoding (I
may have some difficulty is persuading my system to send it without
encoding,
Sounds like your system knows what it is doing.
Well almost. I had to hack the message manually and pipe it into sendmail
to make sure it went out as I intended :-(. I would still be curious to
know if anyone found that the two versions did or did not satisfy the MD5
check, and did or did not display correctly (esp. on windoze boxes).
--
Charles H. Lindsey ---------At Home, doing my own thing------------------------
Tel: +44 161 436 6131 Fax: +44 161 436 6133 Web: http://www.cs.man.ac.uk/~chl
Email: chl(_at_)clw(_dot_)cs(_dot_)man(_dot_)ac(_dot_)uk Snail: 5
Clerewood Ave, CHEADLE, SK8 3JU, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5