Re: Content-Canonicalization: crlf?

Ugh, I guess I wasn't very clear, since I was only understood in one
private response.

I am certainly not advocating sending anything by canonical CRLF line
terminated text on the wire.  I've been around enough to have seen this
discussion before and implemented enough MIME stuff to know that it is an
impractical and useless thing to do and will lead to each MUA having to
implement a multitude of text formats.

The issue I was trying to address is how a MUA knows whether or not it
should convert CRLF into the local line convention or not for a given MIME
leaf part.  An example of where a problem would occur with the current
practice would be some type under application which is textual data that
should be converted to the local line format.  One example that comes to
mind might be some scripting language that is locally executed (security
implications aside) as well as edited.

To be concrete, the sender of type application/script-z follows Appendix G
and first creates the local form, a plain text file with script code in it.
Then it converts the line ends from the local format to CRLF, applies B64
encoding and sends it.  The receiver removes the B64 encoding and then does
not know whether or not to convert the CRLF's to local line convention or
not.

My suggestion for Content-Canonicalization was that the field have only two
values: "crlf" or none (meaning binary) to indicate whether or not
canonicalization has been applied.  Maybe other canonicalizations could be
added later if one becomes clear for a set of content types, but it's
probably best to ignore that completely right now.

Larry's message suggested another possibility -- that the indication that
canonicalization has been applied is implied by the content-type.  This
does seem reasonable and it seems to be the current practice, though I
haven't found any place that this is explicitly stated.

Appendix G only describes the encoding process and not the decoding process
where this becomes an issue.  If canonicalization is implied by the content
type, it is not stated anywhere.  For example, the description of
content-type text doesn't say that its canonical representation has CRLF
line endings.  Right now there are differences in implementations.  Munpack
converts CRLF for type text/* only, and Pine converts for type text/* and
messages/rfc822.  It seems to me some text that describes the decoding
process and a statement that says text/* always should have CRLF line
ending would be a good thing.


Going on a bit further, I think there are some things about current
practice that make this issue more confusing: implementations which store
822 messages as text files usually convert them to local end of line
conventions before any MIME parsing or decoding is ever performed.  This is
true of sendmail and smail implementations on UNIX and thus true for any
POP clients talking to UNIX servers.  The result is that *any* content type
that is not B64 encoded will have all it's CRLF's converted to the local
line convention.  Thus far this isn't a huge problem, but it probably not
well understood.  Also, I believe it is possible that implementors of some
content types may try to take advantage of this.  For example if
application/script-z mentioned above were to not use B64 encoding and leave
its line endings exposed, they would get converted to the local line
format.  This is probably somewhat perverse, but I think it may be assumed
by new MIME implementors that look at how things currently work and don't
think about the spec carefully.

The use of security multiparts (or encryption of any MIME data) is the
thing that first brought this issue to my attention because there may be no
end of line canonicalization on the whole MIME-gram before the MIME
decoding.  I believe not performing conversion to the local format until
MIME parsing is complete may actually be correct, but it is different from
what mostly happens with UNIX implementations today.  The fact that Pine
currently converts message/rfc822 to local line endings seems to be an
emulation of what currently happens on UNIX implementations and may be
wrong.  (I actually wrote the first version of this in Pine, so it could be
my fault that message/rfc822 is converted -- it's been too long and I can't
remember.)

So, I do think something needs to be clarified (aside from my own messages).

Laurence Lundblade     <lgl(_at_)qualcomm(_dot_)com>
Qualcomm, Inc.         619-274-4229