Here is the Mnemonic RFC draft (also known as
quoted-readable text type in RFC-XXXX).
It refers to another RFC on character sets that I hesitate to
distribute to everybody, it is almost 200 kb.
Keld
---
Network Working Group Philippe-Andre Prindeville
Request For Comments: DRAFT Telecom Paris
Keld Simonsen
Danish Unix Users Group
July 1991
Mnemonic Text Format
Status of the Memo
This memo specifies an encoding format that permits the
exchange of textual messages consisting of characters from a
wide range of international scripts, including Latin, Cyril-
lic, Greek, Arabic, Hebrew, Katakana, Hiragana and some spe-
cial characters.
The encoding is done so that originating and receiving end
user equipment have a good chance of communicating under-
standingly although they have different capabilities, as a
readable unambigous mnemonic fallback is defined.
This supplements the set defined in [1], and in RFC-XXXX
[2], where the format specified in this memo is known as
Text/Quoted-Readable. The format defined in this memo is
the preferred format for exchanging alphabetic messages. The
memo uses the definitions about characters and coded charac-
ter sets as specified in RFC-CHAR [3].
Distribution of this memo is unlimited.
Acknowledgements
This memo was inspired by [1],[4], and [5], as well as by
conversations with Justin Bur of l'E'cole Polytechnique
Federale de Lausanne (EPFL) and people active within
l'Association Franc,ais d'Utilisateurs Unix (AFUU), and les
Re'seaux Associe's pour la Recherche Europeen (RARE).
Introduction
As the Internet grows in size, the number and scope of users
increases. TCP/IP has become a player in the pan-European
networking arena [6], and the number of other networks that
the Internet connects to is also mounting. In short, the
Internet is becoming Internationalized. With this expanding
circle of users, the breadth of their needs similarly
increases.
One of the most popular services of the Internet, electronic
mail (email), is also perhaps one of the least adequate to
meet this new demand. Issues of addressing and gatewaying
Prindeville & Simonsen [Page 1]
RFC-MNEM Mnemonic Text Format July 1991
have been conceived and implemented, but email still bears
the constraint that messages be composed of the 7-bit ASCII
graphical character set. For non-Anglophones, this is sim-
ply not adequate. This memo defines techniques to meet this
international demand.
For the remainder of this document, we shall take the subset
of ASCII characters that have ISO, EBCDIC, and Teletext
equivalents, and refer to it as ICS (Invariant Character
Set, equivalent to invariant ISO 646 [7]). We regard this
as the minimal universal character set. Its contents are
given in RFC-CHAR as the ISO_646.inv:1983 character set.
Considerations
When approaching the problem, we identified a few major con-
siderations. The solution:
o+ must render reasonable results on an ICS terminal;
Not all users will have access to resources that can
display the complete set (indeed, few are expected to have
the full set available); still others will continue to use
the ubiquitous ICS terminal. In such instances, this
encoding must yield acceptable results.
o+ must be extensible, to incorporate future insights;
Work continues on the definition and cataloging of
national character sets. One fairly extensive list, ISO
DIS 10646 [8], is being compiled at the time of this writ-
ing. Symbols will probably be added in the future: at
such time, they should by accomodated. Therefore, expan-
dability is needed.
o+ must work with existing MTAs and UAs;
System software is costly and difficult to install;
further, current mail addressing techniques offer little
or impractical control of the routing of messages. As a
result, mail may be carried by obsolete Message Transfer
Agents (MTAs). Further, message encoding is a
presentation-level service, and is best dealt with by the
User Agent (UA). User Agent software may also be diffi-
cult to change, so a solution must be able to work with
existing UAs.
o+ must align with internet methodology;
As mentioned in the first point above, the user, implemen-
tor, or system administrator may not have access to ade-
quate encoding/decoding or rendering facilities. He may
be obliged to view or enter/manipulate encoded text by
hand. In order to support this, an encoding format should
be simple and intuitive.
Prindeville & Simonsen [Page 2]
RFC-MNEM Mnemonic Text Format July 1991
o+ should interoperate with a broad range of systems;
The current networking environment contains many diverse
types of systems with varied interchange formats (e.g.
BITNET, X.500, UUCP). To interoperate with the greatest
number of them, exchange must be based on the most common
assumptions: a limited character set, limited line
lengths, etc.
o+ should be simple and unambiguous;
Any solution that is to have widespread acceptance must be
simple and unambiguous; indeed, the latter frequenctly
precludes the former.
Message Format
As in [2], the message exists as a series of parts, each
part being a group of lines containing characters in the
character set employed. Each part may or may not be encoded
using this format; we concern ourselves in this document
solely with those that are. Within the relevant parts of
the message, ordinary text may have occurrences of the fol-
lowing sequence: an intro character (see below), followed by
a string of characters that represent a character mnemonic,
as given in RFC-CHAR. For character menmonics longer than
two characters, these are surrounded by the underline char-
acter.
Content-Type Specifiers
A message in this format bears the Content-Type: field in
the message header, with the following parameters, seperated
by blanks.
Keyword
The keyword is given as Text/Mnemonic. The text is intended
to be read by the end user possibly without further inter-
vention.
charset
The charset is given as one of the coded character set names
in RFC-CHAR and is the encoding used for the transport..
For general use on the Internet, only "ASCII" is allowed.
The use of other character sets are delimited to agreement
between the communicating parties. When such an agreement
has been achieved, or when a User Agent is operating in
another character set than this transport character set,
conversion of the message body part is done according to the
tables in RFC-CHAR, as characters occuring in both encodings
Prindeville & Simonsen [Page 3]
RFC-MNEM Mnemonic Text Format July 1991
are just transformed, and characters not existing in the
receiving code are represented by the intro character of the
receiving code plus the mnemonic from RFC-CHAR, as described
under the intro character. The Content-Type:-header is
changed accordingly to reflect such conversion.
Intro
The intro character is given as the decimal value of the
intro character in the communication character set. The
recommended value is 38 for the ampersand (&) character in
ASCII. Another common value is 29 for the control character
Field Seperator, which may be convenient when operating in
some environments. The intro character is used for intro-
ducing character mnemonics from RFC-CHAR when a character is
not present in the communicating character set (as defined
in the "charset" parameter). Character mnemonics longer
than two characters are surrounded by the underline charac-
ter. The intro character is doubled to repesent one occu-
rance of itself. Characters in the communication character
set are normally just represented with their encoding, but
may also be represented by the intro character and the
mnemonic encoding.
version
The version is two decimal numbers separated by a period.
The current version is 1.0. If applications conforming to
this specification interoperates with other versions of this
specifications, and it encounters mnemonics that are unde-
fined with this specification, it shall leave the mnemonic
as it is coded. This provides for upward compatibility.
orig-charset
The orig-charset is given as the original character set
name. This may be set by the sending User Agent before con-
verting the message into a character set suitable for tran-
sport. If no orig-charset is specified, the charset charac-
ter set is used.
orig-Intro
The orig-intro character is given as the original intro
character as used by the originating User Agent. The orig-
charset and orig-intro may be used to recreate the message
in its original encoding. If no orig-intro character is
specified, the intro character is used.
Unknown options are ignored.
Prindeville & Simonsen [Page 4]
RFC-MNEM Mnemonic Text Format July 1991
Examples of headers:
Content-Type: Text/Mnemonic ASCII 38
Content-Type: Text/Mnemonic ASCII 38 1.0 ISO_8859-1 38
Prindeville & Simonsen [Page 5]