ietf-822
[Top] [All Lists]

newline encoding considered harmful

1992-12-14 06:37:58
Sorry for the provocative title, but I wanted to have your attention :-)

In the past there has been some confusion about the interaction of local
newline representations and transfer encodings. Thanks to those raising
this on the mailing list back then (sorry, I don't remember the names offhand),
RFC-MIME is now very explicit on the matter (at least to me), even
though some of you felt it was unnecessary to add the explicit wording
(at that time I was one of them, but not anymore), since everything said
was already implied by the rest of the RFC.

However, the three implementations that I looked at (is anyone keeping a
list of MIME implementations?) all seem to be WRONG in this respect.
I looked primarily at mmencode from the metamail package, but metamail
itself, mh-mime and c-client seem to share the deficiencies.

Specifically, I believe MIME requires the following to take place during
transfer encoding/decoding (regarding newline conversions):

text -> base64

        Local newlines MUST be converted to CRLF sequences before encoding.

base64 -> text

        CRLF sequences after decoding MUST be converted to local newlines.

binary -> base64, base64 -> binary

        No special requirements.

text -> quoted-printable

        Local newlines MUST be converted to hard newlines (newlines not
        preceded by an equals sign; those are soft newlines).

        An robust implementation could also convert =0D=0A sequences to local
        newlines (since they represent newlines in the canonical
        encoding, since we are dealing with text). But an error message
        is probably more appropriate, since the encoding agent SHOULD
        have used a hard newline instead.

quoted-printable -> text

        Hard newlines MUST be converted to local newlines.

binary -> quoted-printable

        Hard newlines MUST NOT be used in the encoding. In particular,
        byte sequences that would represent a local newline in text MUST
        NOT be encoded as a hard newline.
        
quoted-printable -> binary

        No special requirements.

        A robust implementation could convert hard newlines to
        CRLF sequences. But again an error message is probably better,
        since the encoding agend SHOULD NOT have used hard newlines.

This basically boils down to the following:

        * Encoding/decoding MUST be done from the CANONICAL
          representation of the data (which for text means representing
          newlines as CRLF sequences).
          
        * In the quoted-printable encoding, hard newlines should be
          treated as representing the two byte sequence 0D 0A.

The mmencode program violates both of these (and so does metamail, I
think, and the others too probably): in base64 encoding/decoding, it does not
treat newlines specially (not even if encoding text), while in
quoted-printable encoding/decoding it does treat them specially (even if
encoding non-text). This catches the two most common cases, BUT IS
DEFINITELY WRONG.

For mmencode I would suggest that a -t option be added and clear wording
about its use in the manpage:

        When encoding/decoding textual data, the use of the -t option is
        REQUIRED in order to properly treat newlines in the data.

However, I think that mmencode is intrinsically too limited. What is
required is a program that acts on complete MIME messages (which could
then have an option -h for treating bodies only).

For metamail itself, options are not necessary for the treatment of
content-types text/*, since metamail knows that these must be treated
like text (even though it currently doesn't pass this on to the decoding
routines). Other content-types should in general be treated as binary
data. There may be content-types, however, where conversion is
sometimes necessary; for example application/postscript or
application/x-unix-shell or things like that. So it would be beneficial
if metamail could be told to treat these as text when writing them to a
file (ideally from the mailcap file, although prompting the user may be
necessary for some types, e.g. application/postscript).

The same probably applies mutatis mutandis to mh-mime and c-client.
--
Luc Rooijakkers                                 Internet: 
lwj(_at_)cs(_dot_)kun(_dot_)nl
Faculty of Mathematics and Computer Science     UUCP: uunet!cs.kun.nl!lwj
University of Nijmegen, the Netherlands         tel. +3180652271

<Prev in Thread] Current Thread [Next in Thread>