Re: Newline problem: Another stab

Date: Tue,  3 Mar 1992 09:47:10 -0500 (EST)
From: Nathaniel Borenstein <nsb(_at_)thumper(_dot_)bellcore(_dot_)com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
To: MIME <ietf-822(_at_)dimacs(_dot_)rutgers(_dot_)edu>
Subject: Newline problem:  Another stab
References: <9203030356(_dot_)AA23977(_at_)sran8(_dot_)sra(_dot_)co(_dot_)jp>

Sigh.  Why does this one little thing seem to be so hard?


Because we are dealing with two kinds of ends-of-lines here -- those
in the text, and those in the resulting MIME message, and it is easy
to get the two confused.   

In addition, we are in the habit of using CRLF to mean "an end of line (using
local system conventions) when stored on a computer system, but 0D 0A when
transmitted by SMTP.    For plain old 822 mail it's easy to read the RFC and
mentally substitute "end-of-line" when reading CRLF, because there's only one
kind of CRLF.  For MIME mail we have two kinds of CRLF.

This doesn't make sense to me.  Now it sounds like you're proposing the
following -- using UNIX as an example, but far from the only one:

1.   UNIX UA lets user compose mail, with LF for newlines.

2.  Before QP encoding, LFs are changed to CRLFs.  (This is the step I
believe to be wrong.)

3.  QP encoding takes place.

4.  QP output has CRLFs changed BACK TO LF(!!!).

5.  Sendmail gets the mail with LFs and changes them to CRLFs.


Nathaniel,

This is exactly what I think we need.  I know it sounds silly, but it is far
simpler than making exceptions for text body parts and quoted-printable.

I would generalize the procedure  you outlined above to all types of body
parts and all content-transfer-encodings, as follows:

1.  Body part is "composed" somehow, in some "native" format.  This might be
a UNIX-style text file, or a Sun raster image, or audio samples in a
system-dependent format, whatever.

2.  Before a content-transfer-encoding is applied to a body part, the body
part is first converted to "canonical" format.  Continuing the examples
above, the canonical format might be a CRLF-delimited text file, a GIF file,
and audio samples according to the audio/basic spec.

3.  Content-transfer-encoding is applied.

4.  The encoded object is inserted into a MIME-message with appropriate body
part headers and boundary markers.

Note that after step 1, any object to be encoded is just an octet stream, and
the rules are the same no matter which content-transfer-encoding gets applied.

In practice, of course, many q-p encoders will combine steps  1 and 2,
especially if they "know" whether the object being encoded is text or binary. 
That's fine as long as the result is the same.  But...

    It is very important to specify things in such a way that every 
     content-type has a well-defined canonical form that is independent 
     of content-transfer-encoding.

When specified this way, 

* it's easy to define how a content integrity check should work (it just gets
  computed over the output of step 1), 

* it's easy to define how to convert from one encoding to another (undo steps
  4 through 2, and redo steps 2 through 4 with the new encoding), 

* the encoding of text body parts is consistent with the encoding of other
  body parts.

* if the need arises to do so, it is easy to define a new
  content-transfer-encoding without changing the definition of any
  body part.


The simple rule that lets this happen is:  in quoted-printable, octets 0D 0A
MAY be encoded as ("hard") end-of-line, and when decoding, a "hard"
end-of-line ALWAYS means 0D 0A.    (If you want to be really strict, then say
that octets 0D 0A may only be encoded as end-of-line when they are intended
to represent an end-of-line in the native format text, but I think it is a
lot simpler to leave this rule out.)

It also needs to be said that the canonical form of a text/* object is one
where end-of-line is always represented as a CR LF pairs from the specified
character set.  This makes it clear how to encode a text/* object in base64. 
(This rule may be extended to other content-types if the definition for that
content-type specifically says to use the text end-of-line rule.)

Keith