ietf-822
[Top] [All Lists]

Content-Type: text/paragraph. An alternative proposal

1998-02-16 06:35:56
From <draft-newman-mime-textpara-00.txt>

                   The Text/Paragraph Media Type

    The text/plain media type is defined to represent plain text where
    the CRLF sequence represents a line break [MIME-IMT].  Many modern
    computer systems have a different concept of ``plain text'' from
    the systems where the text/plain media type originated.  These
    modern systems usually use a proportional-spaced font and use CRLF
    to represent paragraph breaks.  Numerous software products have
    erroneously labelled this media type as text/plain.  In order to
    correct this interoperability problem, the text/paragraph media
    type is defined.


text/paragraph is then defined in such a way as to simply codify the
existing (mal)practice. It still results in existing MIME-compliant
software displaying messages that use the new media-type with unreadably
long lines. As mentioned later in that draft, there may also be problems
when such messages are quoted (and requoted), and with signature files
which usually include lines that are not meant to be wrapped.

This definition of text/paragraph is also unlikely to be taken up by the
software vendor mainly responsible for misusing text/plain in the
messages that we see. At least one of its offerings treats unknown sub-
types of text as application/octet-stream and so they will be unable to
upgrade to text/paragraph without seriously breaking their installed
base.

Since the current drafting of text/paragraph does not seem to advance
the current position significantly, why don't we take the opportunity to
define a media type that caters for the real need: the ability to allow
compliant software to "know" whether lines may or may not be wrapped in
a backwards-compatible manner.

To be truly backwards compatible, a text/paragraph message body should
be displayed to a user of current MIME-compliant software in exactly the
same way as it would have been if it had been sent text/plain in the
first place. This means that there must be a limit on the length of
lines that are sent (probably 72 characters) and there must be no
visible "mark up" used to encode the various types of line endings.

The two different line types commonly found in mail or news are
those with pre-formatted line-breaks which are not expected to be
wrapped (quoted lines, signature lines including "-- " itself) and lines
containing new material from the author which are expected to be
wrapped. Other lines include lines of UUencoded material and lines
produced by PGP.

For the line-break encoding to be invisible to the end-user, it seems to
me that the encoding must be done by using trailing white space.

Thus, my proposal for text/paragraph would be that:

   significant line-breaks terminating pre-formatted lines would be
   encoded by preceding the CRLF with zero or one space character (the
   imprecision here being to cater for "-- ").

   significant line-breaks terminating lines ("paragraphs") intended to
   be wrapped by the viewing MUA would be encoded by preceding the CRLF
   by precisely two space characters.

   non-significant line-breaks ("soft" end-of-lines) inserted by the
   sending MUA for backwards-compatibility would be encoded by preceding
   the CRLF by three space characters. All "paragraphs" to be wrapped
   must be split into lines of no more than 72 characters using this
   "soft" end of line mechanism.

   when quoting from a message sent in text/paragraph, _all_ line-breaks
   (including soft eols) should be treated as pre-formatted, significant
   line-breaks and the trailing spaces encoding line-breaks within
   paragraphs should be removed. This will lead to the proper nesting of
   quoted material whether all MUAs in the conversation thread recognize
   text/paragraph or not.

Encoding considerations:        

   text/paragraph messages would be entirely suitable for 7/8-bit
   encoding whatever the length of the lines in the "paragraphs".
   Indeed, if a message is suitable for 7-bit transmission with the
   default character set, it could even be safely sent to non-MIME
   recipients or arenas (especially UseNet).

   it is said that some MTAs or gateways routinely strip trailing white-
   space or even pad lines with white space. The effect of the former is
   simply to reduce the message back to a text/plain equivalent. The
   effect of the latter would easily be spotted from the pattern of
   white space before the line endings. Either effect could be finessed
   by using quoted-printable encoding (but then the messages would never
   be suitable for sending to non-MIME recipients). "Munging" of
   trailing white-space does not seem to pose a significant problem
   here.

Display considerations: 

   since pre-formatted lines may have been formatted using fixed-pitch
   fonts (especially lines from signature files), MUAs may choose to
   display preformatted lines in a fixed pitch font while displaying
   paragraphs in a proportional font.

Conclusion

I don't think that the definition of text/paragraph described in the
first draft is very useful. However, if my ideas on line-break encoding
are acceptable, I believe that text/paragraph would be more useful to
modern MUAs than text/plain and more widely usable than text/html. There
seem to be no down-sides compared to text/plain and the up-side is that
email messages and UseNet articles could be displayed in modern
proportional fonts while preserving the layout of quoted material,
signatures and even embedded tables.


New functionality, fully backwards-compatible, with no down-sides - what
have I missed? :-)

-- 
Ian Bell                                           T U R N P I K E  Ltd