ietf-822
[Top] [All Lists]

Quoted-printable specification.

1991-12-17 02:54:23
I feel quite uneasy with the definition of Quoted-printable since a long
time. There are several reasons. It seems that the global structure does
not follow a  more general -> more particular order  in the statement of
the rules. Probably  as a consequence, I feel  several contradictions in
the text. I include below a  commented version of the current text (only
the main problem areas are shown),  and a 'proposal' for an alternative,
restructured text. Please note that  the alternative text is intended to
represent exactly the  same specification ; I am *not*  trying to change
anything !!!!!
                                                              /AF

Comments on the current version :

  In this encoding, octets with decimal values of  33  through
  57   inclusive,  and  59  through  126,  inclusive,  MAY  be
--60                    62
  represented as the  ASCII  characters  which  correspond  to
  those  octets  (EXCLAMATION  POINT  through  LESS  THAN, and
  GREATER THAN through TILDE, respectively). All other values,
  including  those  corresponding  to  ASCII SPACE (32), EQUAL
  SIGN (61), DEL (127), all  octets  less  than  32,  and  all
  octets greater than 127, are to be represented as determined
  by the following rules:

       Rule #1: (General 8-bit representation)  Any octet  may
       be  represented  by  an  "="  followed  by  a two digit
       hexadecimal representation of the octet's  value.   The
-- Not true for 13 and 10 representing a line break.
       digits  of  the hexadecimal alphabet, for this purpose,
       are  "0123456789ABCDEF".   Uppercase   letters   should
       always be used when sending hexadecimal data, though an
       implementation  may  choose  to   recognize   lowercase
       letters  on  receipt.   Thus, for example, the value 12
       can be represented by "=0C", and the  value  61  (ASCII
       EQUAL  SIGN)  can  be  represented by "=3D". Rule #1 is
       optional for octets with values of 9 (e.g. ASCII TAB or
       HT),  10  (ASCII LINEFEED), 13 (ASCII CARRAIGE RETURN),
       and 32  through  126  (SPACE  through  TILDE),  and  is
       REQUIRED for all other values.
-- The last sentence is not true :
--  - special rules apply for 9 10 13 20, rule #1 is *not* simply
--    optional;
--  - 61 is between 32 and 126, and requires special treatment.

 ---------------------------------------------------------------------
Alternative, restructured text proposal :

5.1  Quoted-Printable Content-Transfer-Encoding

The Quoted-Printable encoding is intended to represent  data
that largely consists of octets that correspond to printable
characters in the ASCII character set.  It encodes the  data
in  such  a way that the resulting octets are unlikely to be
modified by mail transport.  If the data being  encoded  are
mostly  ASCII  text,  the  encoded  form of the data remains
largely recognisable by humans.  A message which is entirely
ASCII  may also be encoded in Quoted-Printable to ensure the
integrity of the data should  the  message  pass  through  a
character-translating, and/or line-wrapping gateway.

In this encoding, octets are to be represented as determined
by the following rules:

     Rule  #1:  (General  8-bit representation)  Any  octet,
     except those  indicating a line break  according to the
     local newline convention, may  be represented by an "="
     followed by  a two digit hexadecimal  representation of
     the  octet's  value.  The  digits  of  the  hexadecimal
     alphabet,  for  this purpose,  are  "0123456789ABCDEF".
     Uppercase letters  should always  be used  when sending
     hexadecimal data,  though an implementation  may choose
     to recognize  lowercase letters  on receipt.  Thus, for
     example, the value 12 can  be represented by "=0C", and
     the value 61  (ASCII EQUAL SIGN) can  be represented by
     "=3D".  Except  when  the   following  rules  allow  an
     alternative encoding, this rule is mandatory.

     Rule #2:  (Literal representation) Octets  with decimal
     values of 33 through 60  inclusive, and 62 through 126,
     inclusive, MAY  be represented as the  ASCII characters
     which  correspond to  those  octets (EXCLAMATION  POINT
     through  LESS THAN,  and  GREATER  THAN through  TILDE,
     respectively).

     Rule #3: (White Space): Octets  with values of 9 and 32
     MAY  be  represented  as   ASCII  TAB  (HT)  and  SPACE
     characters,   respectively,   but   MUST  NOT   be   so
     represented at the end of an encoded line. Any TAB (HT)
     or SPACE  characters on  an encoded  line MUST  thus be
     followed  on that  line  by a  printable character.  In
     particular,  an "="  at  the end  of  an encoded  line,
     indicating a soft  line break (see rule  #5) may follow
     one or  more TAB (HT)  or SPACE characters.  It follows
     that octets with  values 9 and 32 appearing  at the and
     of  an encoded  line must  be represented  according to
     Rule #1. This  rule is necessary because  some MTAs are
     known to pad lines of  text with SPACEs, and others are
     known to  remove "white space" characters  from the end
     of a line. Therefore,  when decoding a Quoted-Printable
     message, any  trailing white  space on  a line  must be
     deleted,  as it  will  necessarily have  been added  by
     intermediate transport agents.

     Rule  #4  (Line Breaks):  A  line  break, whatever  its
     representation   is   following   the   local   newline
     convention,  must be  represented by  a (RFC  822) line
     break,   which    is   a   CRLF   sequence,    in   the
     Quoted-Printable encoding. If isolated  CRs and LFs, or
     LF  CR and  CR LF  sequences are  allowed to  appear in
     binary data  according to local conventions,  they must
     be  represented using  the "=0D",  "=0A", "=0A=0D"  and
     "=0D=0A" notations respectively.

     Rule  #5  (Soft   Line  Breaks):  The  Quoted-Printable
     encoding REQUIRES that encoded lines be no more than 76
     characters long. If longer lines are to be encoded with
     the Quoted-Printable encoding,  'soft' line breaks must
     be  used. An  equal sign  as  the last  character on  a
     encoded line indicates  such a non-significant ('soft')
     line break in the encoded  text. Thus if the "raw" form
     of the line is a single line that says:

     Now's the time for all folk to come to the aid of their
     country.

     This  can  be  represented,  in  the   Quoted-Printable
     encoding, as

     Now's the time =
     for all folk to come=
      to the aid of their country.

     This  provides a  mechanism with  which long  lines are
     encoded in  such a way  as to  be restored by  the user
     agent.  The  76  character  limit does  not  count  the
     trailing  CRLF,   but  counts  all   other  characters,
     including any equal signs.

<Prev in Thread] Current Thread [Next in Thread>