RFC-MNEM draft RFC on quoted-readable/Mnemonic text

Here is the Mnemonic RFC draft (also known as
quoted-readable text type in RFC-XXXX).
It refers to another RFC on character sets that I hesitate to
distribute to everybody, it is almost 200 kb.

Keld
---








     Network Working Group             Philippe-Andre Prindeville
     Request For Comments: DRAFT                    Telecom Paris
                                                    Keld Simonsen
                                          Danish Unix Users Group
                                                        July 1991


                         Mnemonic Text Format


     Status of the Memo

     This memo specifies an encoding format that permits the
     exchange of textual messages consisting of characters from a
     wide range of international scripts, including Latin, Cyril-
     lic, Greek, Arabic, Hebrew, Katakana, Hiragana and some spe-
     cial characters.

     The encoding is done so that originating and receiving end
     user equipment have a good chance of communicating under-
     standingly although they have different capabilities, as a
     readable unambigous mnemonic fallback is defined.

     This supplements the set defined in [1], and in RFC-XXXX
     [2], where the format specified in this memo is known as
     Text/Quoted-Readable.  The format defined in this memo is
     the preferred format for exchanging alphabetic messages. The
     memo uses the definitions about characters and coded charac-
     ter sets as specified in RFC-CHAR [3].

     Distribution of this memo is unlimited.

     Acknowledgements

     This memo was inspired by [1],[4], and [5], as well as by
     conversations with Justin Bur of l'E'cole Polytechnique
     Federale de Lausanne (EPFL) and people active within
     l'Association Franc,ais d'Utilisateurs Unix (AFUU), and les
     Re'seaux Associe's pour la Recherche Europeen (RARE).

     Introduction

     As the Internet grows in size, the number and scope of users
     increases.  TCP/IP has become a player in the pan-European
     networking arena [6], and the number of other networks that
     the Internet connects to is also mounting.  In short, the
     Internet is becoming Internationalized.  With this expanding
     circle of users, the breadth of their needs similarly
     increases.

     One of the most popular services of the Internet, electronic
     mail (email), is also perhaps one of the least adequate to
     meet this new demand.  Issues of addressing and gatewaying


     Prindeville & Simonsen                              [Page 1]







     RFC-MNEM            Mnemonic Text Format           July 1991


     have been conceived and implemented, but email still bears
     the constraint that messages be composed of the 7-bit ASCII
     graphical character set.  For non-Anglophones, this is sim-
     ply not adequate.  This memo defines techniques to meet this
     international demand.

     For the remainder of this document, we shall take the subset
     of ASCII characters that have ISO, EBCDIC, and Teletext
     equivalents, and refer to it as ICS (Invariant Character
     Set, equivalent to invariant ISO 646 [7]).  We regard this
     as the minimal universal character set.  Its contents are
     given in RFC-CHAR as the ISO_646.inv:1983 character set.

     Considerations

     When approaching the problem, we identified a few major con-
     siderations.  The solution:

     o+ must render reasonable results on an ICS terminal;
       Not all users will have access to resources that can
       display the complete set (indeed, few are expected to have
       the full set available); still others will continue to use
       the ubiquitous ICS terminal.  In such instances, this
       encoding must yield acceptable results.

     o+ must be extensible, to incorporate future insights;
       Work continues on the definition and cataloging of
       national character sets.  One fairly extensive list, ISO
       DIS 10646 [8], is being compiled at the time of this writ-
       ing.  Symbols will probably be added in the future: at
       such time, they should by accomodated.  Therefore, expan-
       dability is needed.

     o+ must work with existing MTAs and UAs;
       System software is costly and difficult to install;
       further, current mail addressing techniques offer little
       or impractical control of the routing of messages.  As a
       result, mail may be carried by obsolete Message Transfer
       Agents (MTAs).  Further, message encoding is a
       presentation-level service, and is best dealt with by the
       User Agent (UA).  User Agent software may also be diffi-
       cult to change, so a solution must be able to work with
       existing UAs.

     o+ must align with internet methodology;
       As mentioned in the first point above, the user, implemen-
       tor, or system administrator may not have access to ade-
       quate encoding/decoding or rendering facilities.  He may
       be obliged to view or enter/manipulate encoded text by
       hand.  In order to support this, an encoding format should
       be simple and intuitive.



     Prindeville & Simonsen                              [Page 2]







     RFC-MNEM            Mnemonic Text Format           July 1991


     o+ should interoperate with a broad range of systems;
       The current networking environment contains many diverse
       types of systems with varied interchange formats (e.g.
       BITNET, X.500, UUCP). To interoperate with the greatest
       number of them, exchange must be based on the most common
       assumptions: a limited character set, limited line
       lengths, etc.

     o+ should be simple and unambiguous;
       Any solution that is to have widespread acceptance must be
       simple and unambiguous; indeed, the latter frequenctly
       precludes the former.

     Message Format

     As in [2], the message exists as a series of parts, each
     part being a group of lines containing characters in the
     character set employed.  Each part may or may not be encoded
     using this format; we concern ourselves in this document
     solely with those that are.  Within the relevant parts of
     the message, ordinary text may have occurrences of the fol-
     lowing sequence: an intro character (see below), followed by
     a string of characters that represent a character mnemonic,
     as given in RFC-CHAR.  For character menmonics longer than
     two characters, these are surrounded by the underline char-
     acter.



     Content-Type Specifiers

     A message in this format bears the Content-Type: field in
     the message header, with the following parameters, seperated
     by blanks.


     Keyword

     The keyword is given as Text/Mnemonic.  The text is intended
     to be read by the end user possibly without further inter-
     vention.

     charset

     The charset is given as one of the coded character set names
     in RFC-CHAR and is the encoding used for the transport..
     For general use on the Internet, only "ASCII" is allowed.
     The use of other character sets are delimited to agreement
     between the communicating parties. When such an agreement
     has been achieved, or when a User Agent is operating in
     another character set than this transport character set,
     conversion of the message body part is done according to the
     tables in RFC-CHAR, as characters occuring in both encodings


     Prindeville & Simonsen                              [Page 3]







     RFC-MNEM            Mnemonic Text Format           July 1991


     are just transformed, and characters not existing in the
     receiving code are represented by the intro character of the
     receiving code plus the mnemonic from RFC-CHAR, as described
     under the intro character.  The Content-Type:-header is
     changed accordingly to reflect such conversion.

     Intro

     The intro character is given as the decimal value of the
     intro character in the communication character set. The
     recommended value is 38 for the ampersand (&) character in
     ASCII. Another common value is 29 for the control character
     Field Seperator, which may be convenient when operating in
     some environments.  The intro character is used for intro-
     ducing character mnemonics from RFC-CHAR when a character is
     not present in the communicating character set (as defined
     in the "charset" parameter).  Character mnemonics longer
     than two characters are surrounded by the underline charac-
     ter. The intro character is doubled to repesent one occu-
     rance of itself.  Characters in the communication character
     set are normally just represented with their encoding, but
     may also be represented by the intro character and the
     mnemonic encoding.

     version

     The version is two decimal numbers separated by a period.
     The current version is 1.0.  If applications conforming to
     this specification interoperates with other versions of this
     specifications, and it encounters mnemonics that are unde-
     fined with this specification, it shall leave the mnemonic
     as it is coded. This provides for upward compatibility.

     orig-charset

     The orig-charset is given as the original character set
     name.  This may be set by the sending User Agent before con-
     verting the message into a character set suitable for tran-
     sport.  If no orig-charset is specified, the charset charac-
     ter set is used.

     orig-Intro

     The orig-intro character is given as the original intro
     character as used by the originating User Agent. The orig-
     charset and orig-intro may be used to recreate the message
     in its original encoding.  If no orig-intro character is
     specified, the intro character is used.


     Unknown options are ignored.



     Prindeville & Simonsen                              [Page 4]







     RFC-MNEM            Mnemonic Text Format           July 1991


     Examples of headers:
     Content-Type: Text/Mnemonic ASCII 38
     Content-Type: Text/Mnemonic ASCII 38 1.0 ISO_8859-1 38



















































     Prindeville & Simonsen                              [Page 5]