Hi. Since the text of MIME is now being cleaned for the next stage of the
std process, I want to point out that the text of appendix H does not use
the terminology in a coherent way (since this appendix was being independently
written while the terminology was being checked in rest of the text).
Here is the text of appendix H, with suggested corrections to make its use
of the terminology fully coherent.
Appendix H -- Canonical Encoding Model
There was some confusion, in earlier drafts of this memo,
regarding the model for when email data was to be converted
to canonical form and encoded, and in particular how this
process would affect the treatment of CRLFs, given that the
representation of newlines varies greatly from system to
system. For this reason, a canonical model for encoding is
presented below.
The process of composing a MIME message part can be modelled
as being done in a number of steps. Note that these steps
are roughly similar to those steps used in RFC1113:
========== , and are
========== performed for each 'innermost level' body.
Step 1. Creation of local form.
The body part to be transmitted is created in the system's
========== | body |
native format. The native character set is used, and where
appropriate local end of line conventions are used as well.
The may be a UNIX-style text file, or a Sun raster image, or
a VMS indexed file, or audio data in a system-dependent
format stored only in memory, or anything else that
corresponds to the local model for the representation of
some form of information.
Step 2. Conversion to canonical form.
The entire body part, including "out-of-band" information
========== | body |
such as record lengths and possibly file attribute
information, is converted to a universal canonical form.
The specific content type of the body part as well as its
========== | body |
associated attributes dictate the nature of the canonical
form that is used. Conversion to the proper canonical form
may involve character set conversion, transformation of
audio data, compression, or various other operations
specific to the various content types.
For example, in the case of text/plain data, the text must
be converted to a supported character set and lines must be
delimited with CRLF delimiters in accordance with RFC822.
Note that the restriction on line lengths implied by RFC822
is eliminated if the next step employs either quoted-
printable or base64 encoding.
Step 3. Apply transfer encoding.
A Content-Transfer-Encoding appropriate for this body part
========== | body |
is applied. Note that there is no fixed relationship
between the content type and the transfer encoding. In
particular, it may be appropriate to base the choice of
base64 or quoted-printable on character frequency counts
which are specific to a given instance of body part.
========== | body |
Step 4. Insertion into message.
The encoded object is inserted into a MIME message with
========== |entity|
appropriate body part headers and boundary markers.
========== |headers. The entity is then inserted into
========== the body of a higher-level entity (message or multipart)
========== if needed.
It is vital to note that these steps are only a model; they
are specifically NOT a blueprint for how an actual system
would be built. In particular, the model fails to account
for two common designs:
1. In many cases the conversion to a canonical
form prior to encoding will be subsumed into the
encoder itself, which understands local formats
directly. For example, the local newline
convention for text bodyparts might be carried
========== |bodies |
through to the encoder itself along with knowledge
of what that format is.
2. The output of the encoders may have to pass
through one or more additional steps prior to
being transmitted as a message. As such, the
output of the encoder may not be compliant with
the formats specified by RFC822. In particular,
once again it may be appropriate for the
converter's output to be expressed using local
newline conventions rather than using the standard
RFC822 CRLF delimiters.
Other implementation variations are conceivable as well.
The only important aspect of this discussion is that the
resulting messages are consistent with those produced by the
model described here.