ietf-822
[Top] [All Lists]

Text cleaning.

1993-01-20 05:58:02
Hi. Since the text of MIME is now being cleaned for the next stage of the
std process, I want to point out that the text of appendix H does not use
the terminology in a coherent way (since this appendix was being independently
written while the terminology was being checked in rest of the text).

Here is the text of appendix H, with suggested corrections to make its use
of the terminology fully coherent.




            Appendix H -- Canonical Encoding Model



            There was some confusion, in earlier drafts  of  this  memo,
            regarding  the model for when email data was to be converted
            to canonical form and encoded, and in  particular  how  this
            process  would affect the treatment of CRLFs, given that the
            representation of newlines varies  greatly  from  system  to
            system.   For this reason, a canonical model for encoding is
            presented below.

            The process of composing a MIME message part can be modelled
            as  being  done in a number of steps.  Note that these steps
            are roughly similar to those steps used in RFC1113:
==========                                                    , and are
==========  performed for each 'innermost level' body.
            Step 1.  Creation of local form.

            The body part to be transmitted is created in  the  system's
==========      | body  |
            native format.   The native character set is used, and where
            appropriate local end of line conventions are used as  well.
            The may be a UNIX-style text file, or a Sun raster image, or
            a VMS indexed file, or  audio  data  in  a  system-dependent
            format   stored  only  in  memory,  or  anything  else  that
            corresponds to the local model  for  the  representation  of
            some form of information.

            Step 2.  Conversion to canonical form.

            The entire body part,  including  "out-of-band"  information
==========             |  body |
            such   as   record   lengths  and  possibly  file  attribute
            information, is converted to  a  universal  canonical  form.
            The  specific  content  type of the body part as well as its
==========                                      |  body |
            associated attributes dictate the nature  of  the  canonical
            form  that is used.  Conversion to the proper canonical form
            may involve  character  set  conversion,  transformation  of
            audio   data,   compression,  or  various  other  operations
            specific to the various content types.

            For example, in the case of text/plain data, the  text  must
            be  converted to a supported character set and lines must be
            delimited with CRLF delimiters in  accordance  with  RFC822.
            Note  that the restriction on line lengths implied by RFC822
            is eliminated  if  the  next  step  employs  either  quoted-
            printable or base64 encoding.

            Step 3.  Apply transfer encoding.

            A Content-Transfer-Encoding appropriate for this  body  part
==========                                                    |  body  |
            is  applied.   Note  that  there  is  no  fixed relationship
            between the content  type  and  the  transfer  encoding.  In
            particular,  it  may  be  appropriate  to base the choice of
            base64 or quoted-printable  on  character  frequency  counts
            which are specific to a given instance of body part.
==========                                            |  body |

            Step 4.  Insertion into message.

            The encoded object is inserted  into  a  MIME  message  with
==========                                                 |entity|
            appropriate body part headers and boundary markers.
==========              |headers. The entity is then inserted into
==========  the body of a higher-level entity    (message or multipart)
==========  if needed.

            It is vital to note that these steps are only a model;  they
            are  specifically  NOT  a blueprint for how an actual system
            would be built.  In particular, the model fails  to  account
            for two common designs:

                 1.  In many cases the conversion  to  a  canonical
                 form  prior  to encoding will be subsumed into the
                 encoder itself, which  understands  local  formats
                 directly.    For   example,   the   local  newline
                 convention for text  bodyparts  might  be  carried
==========                            |bodies |
                 through to the encoder itself along with knowledge
                 of what that format is.

                 2.  The output of the encoders may  have  to  pass
                 through  one  or  more  additional  steps prior to
                 being transmitted as  a  message.   As  such,  the
                 output  of  the  encoder may not be compliant with
                 the formats specified by RFC822.   In  particular,
                 once   again   it   may  be  appropriate  for  the
                 converter's output to  be  expressed  using  local
                 newline conventions rather than using the standard
                 RFC822 CRLF delimiters.

            Other implementation variations  are  conceivable  as  well.
            The  only  important  aspect  of this discussion is that the
            resulting messages are consistent with those produced by the
            model described here.

<Prev in Thread] Current Thread [Next in Thread>
  • Text cleaning., Alain FONTAINE (Post master - UCL) <=