Re: Content-Canonicalization: crlf?

The issue I was trying to address is how a MUA knows whether or not it
should convert CRLF into the local line convention or not for a given MIME
leaf part.


This is entirely determined by content-type.  In general, senders must convert 
a body part to canonical form before applying any content-transfer-encoding 
(even if that encoding is 7bit, 8bit, or unspecified), and receivers must 
convert canonical form (what they get after undoing content-transfer-encoding) 
to local form if the tools used to present the body part require this.

The specification of whether those tools require such conversion doesn't 
belong in the body part header; it belongs in the mailcap or whatever file is 
used to specify presentation.

The conversion between CRLF and local line convention is just one example of 
the conversion between canonical form and local form.  Fortunately, for most 
content-types on most machines there is no difference between the canonical 
form and local form, but the model applies anyway.

To be concrete, the sender of type application/script-z follows Appendix G
and first creates the local form, a plain text file with script code in it.
Then it converts the line ends from the local format to CRLF, applies B64
encoding and sends it.  The receiver removes the B64 encoding and then does
not know whether or not to convert the CRLF's to local line convention or
not.


The sender of this type has no business doing such a conversion unless it 
knows that the canonical form for application/script-z is in fact 
CRLF-delimited
lines.

Appendix G only describes the encoding process and not the decoding process
where this becomes an issue.  If canonicalization is implied by the content
type, it is not stated anywhere.  For example, the description of
content-type text doesn't say that its canonical representation has CRLF
line endings. Right now there are differences in implementations.  Munpack
converts CRLF for type text/* only, and Pine converts for type text/* and
messages/rfc822.  It seems to me some text that describes the decoding
process and a statement that says text/* always should have CRLF line
ending would be a good thing.


Yes, this is a problem, and it needs to be fixed.  Every content-type 
definition should clearly define the canonical form.

Going on a bit further, I think there are some things about current
practice that make this issue more confusing: implementations which store
822 messages as text files usually convert them to local end of line
conventions before any MIME parsing or decoding is ever performed.  This is
true of sendmail and smail implementations on UNIX and thus true for any
POP clients talking to UNIX servers.  The result is that *any* content type
that is not B64 encoded will have all it's CRLF's converted to the local
line convention.  Thus far this isn't a huge problem, but it probably not
well understood.  Also, I believe it is possible that implementors of some
content types may try to take advantage of this.  For example if
application/script-z mentioned above were to not use B64 encoding and leave
its line endings exposed, they would get converted to the local line
format.  This is probably somewhat perverse, but I think it may be assumed
by new MIME implementors that look at how things currently work and don't
think about the spec carefully.


I agree with this also.  The problem is that it's difficult to convey the
notion of the encoding model.  Too little prose and the subtlety gets lost; 
too much prose and the reader is overwhelmed.

Keith