Forwarding: dropped mail

This message was not delivered due to problems at our site. It is being resent.
The following is a copy of the message:

Received: from rutgers.edu (-:RUTGERS.EDU:-) by yonge.csri.toronto.edu via TCP 
with SMTP id AA06734; Mon, 15 Jul 91 19:47:16 EDT
Received: from dimacs.rutgers.edu by rutgers.edu (5.59/SMI4.0/RU1.4/3.08) 
        id AA14195; Mon, 15 Jul 91 19:45:08 EDT
Received: by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) 
        id AA29066; Mon, 15 Jul 91 19:03:01 EDT
Received: from dkuug.dk by dimacs.rutgers.edu (5.59/SMI4.0/RU1.4/3.08) 
        id AA29053; Mon, 15 Jul 91 19:02:50 EDT
Received: by dkuug.dk (5.64+/8+bit/IDA-1.2.8)
        id AA00223; Tue, 16 Jul 91 01:02:16 +0200
Date: Tue, 16 Jul 91 01:02:16 +0200
From: Keld J|rn Simonsen <keld(_at_)dkuug(_dot_)dk>
Message-Id: <9107152302(_dot_)AA00223(_at_)dkuug(_dot_)dk>
To: ietf-822(_at_)dimacs(_dot_)rutgers(_dot_)edu, 
rja7m(_at_)phil(_dot_)cs(_dot_)virginia(_dot_)edu
Subject: Re:  Problems with RFC-MNEM & RFC-CHAR
X-Charset: ASCII
X-Char-Esc: 29

It seems like the minds are converging. I can accept all of
Ran's comments. Here is a new RFC-MNEM - a RFC-CHAR is
avilable by ftp from dkuug.dk:pub/RFC-CHAR

Keld
-----








Network Working Group                                      Keld Simonsen
INTERNET-DRAFT                                   Danish Unix Users Group
                                              Philippe-Andre Prindeville
                                                           Telecom Paris
                                                          15th July 1991


                          Mnemonic Text Format


Status of the Memo

This memo specifies an encoding format that permits the exchange of tex-
tual messages consisting of characters from a wide range of interna-
tional scripts, including Latin, Cyrillic, Greek, Arabic, Hebrew, Kata-
kana, Hiragana and some special characters.

The encoding is done so that originating and receiving end user equip-
ment have a good chance of communicating understandingly although they
have different capabilities, as a readable unambigous mnemonic fallback
is defined. Although this memo specifies ways of doing character set
conversions, it is not allowed to use other character sets than ASCII or
its proper subset ICS for general Internet use with the specifications
in this memo.

The memo supplements and is conformant with the set defined in [1], and
in [2], where the format specified in this memo is known as
Text/Quoted-Readable.  The format defined in this memo is the preferred
format for exchanging alphabetic messages. The memo uses the definitions
about characters and coded character sets as specified in [3].

Distribution of this memo is unlimited.  This draft document will be
submitted to the RFC editor for adaption as a protocol specification.
Please send any comments to Keld Simonsen 
<Keld(_dot_)Simonsen(_at_)dkuug(_dot_)dk>.

Introduction

As the Internet grows in size, the number and scope of users increases.
TCP/IP has become a player in the pan-European networking arena [4], and
the number of other networks that the Internet connects to is also
mounting.  In short, the Internet is becoming Internationalized.  With
this expanding circle of users, the breadth of their needs similarly
increases.

One of the most popular services of the Internet, electronic mail
(email), is also perhaps one of the least adequate to meet this new
demand.  Issues of addressing and gatewaying have been conceived and
implemented, but email still bears the constraint that messages be com-
posed of the 7-bit ASCII graphical character set.  For non-Anglophones,
this is simply not adequate.  This memo defines techniques to meet this
international demand.



Simonsen & Prindeville                                          [Page 1]


INTERNET-DRAFT            Mnemonic Text Format                 July 1991


For the remainder of this document, we shall take the subset of ASCII
characters that have ISO, EBCDIC, and Teletext equivalents, and refer to
it as ICS (Invariant Character Set, equivalent to invariant ISO 646
[5]).  We regard this as the minimal universal character set.  Its con-
tents are given in [3] as the ISO_646.basic:1983 character set.

Considerations

When approaching the problem, we identified a few major considerations.
The solution:

- must render reasonable results on an ICS terminal;
  Not all users will have access to resources that can display the com-
  plete set (indeed, few are expected to have the full set available);
  still others will continue to use the ubiquitous ICS terminal.  In
  such instances, this encoding must yield acceptable results.

- must be extensible, to incorporate future insights;
  Work continues on the definition and cataloging of national character
  sets.  One fairly extensive list, ISO DIS 10646 [6], is being compiled
  at the time of this writing.  Symbols will probably be added in the
  future: at such time, they should by accomodated.  Therefore, expanda-
  bility is needed.

- must work with existing MTAs and UAs;
  System software is costly and difficult to install; further, current
  mail addressing techniques offer little or impractical control of the
  routing of messages.  As a result, mail may be carried by obsolete
  Message Transfer Agents (MTAs).  Further, message encoding is a
  presentation-level service, and is best dealt with by the User Agent
  (UA).  User Agent software may also be difficult to change, so a solu-
  tion must be able to work with existing UAs.

- must align with internet methodology;
  As mentioned in the first point above, the user, implementor, or sys-
  tem administrator may not have access to adequate encoding/decoding or
  rendering facilities.  He may be obliged to view or enter/manipulate
  encoded text by hand.  In order to support this, an encoding format
  should be simple and intuitive.

- should interoperate with a broad range of systems;
  The current networking environment contains many diverse types of sys-
  tems with varied interchange formats (e.g.  BITNET, X.400, UUCP). To
  interoperate with the greatest number of them, exchange must be based
  on the most common assumptions: a limited character set, limited line
  lengths, etc.

- should be simple and unambiguous;
  Any solution that is to have widespread acceptance must be simple and
  unambiguous; indeed, the latter frequenctly precludes the former.




Simonsen & Prindeville                                          [Page 2]


INTERNET-DRAFT            Mnemonic Text Format                 July 1991


Message Format

As in [2], the message exists as a series of parts, each part being a
group of lines containing characters in the character set employed.
Each part may or may not be encoded using this format; we concern our-
selves in this document solely with those that are.  Within the relevant
parts of the message, ordinary text may have occurrences of the follow-
ing sequence: an intro character (see below), followed by a string of
characters that represent a character mnemonic, as given in [3].  For
character menmonics longer than two characters, these are surrounded by
the underline character.



Content-Type Specifiers

A message in this format bears the Content-Type: field in the message
header, with the following parameters, seperated by blanks.


Keyword

The keyword is given as Text/Mnemonic.  The text is intended to be read
by the end user possibly without further intervention.

charset

The charset is given as one of the coded character set names in [3] and
is the encoding used for the transport.  For general use on the Inter-
net, only "ASCII" and "ICS" are allowed.  ASCII is the recommended char-
acter set, while ICS will be very robust for traversing gateways, but it
will cause trouble for (amongst other things) source code for several
programming languages.  The use of other character sets are delimited to
agreement between the communicating parties. When such an agreement has
been achieved, or when a User Agent is operating in another character
set than this transport character set, conversion of the message body
part is done according to the tables in [3], as characters occuring in
both encodings are just transformed, and characters not existing in the
receiving code are represented by the intro character of the receiving
code plus the mnemonic from [3], as described under the intro character.
The Content-Type:-header is changed accordingly to reflect such conver-
sion.

An example of changing headers is the following: The UA runs in an 8-bit
character set:

Content-Type: Text/Mnemonic ISO_8859-1 29 1.0 ISO_8859-1 29

The MTA converts it before sending it to the recepient:

Content-Type: Text/Mnemonic ASCII 38 1.0 ISO_8859-1 29



Simonsen & Prindeville                                          [Page 3]


INTERNET-DRAFT            Mnemonic Text Format                 July 1991


Intro

The intro character is given as the decimal value of the intro character
in the transport character set. The recommended value is 38 for the
ampersand (&) character in ASCII. Another common value is 29 for the
control character Group Seperator, which may be convenient when operat-
ing in some environments.  The intro character is used for introducing
character mnemonics from [3] when a character is not present in the mail
transport character set (as defined in the "charset" parameter).  Char-
acter mnemonics longer than two characters are surrounded by the under-
line character. The intro character is doubled to repesent one occurance
of itself.  Characters in the mail transport character set are normally
just represented with their encoding, but may also be represented by the
intro character and the mnemonic encoding.

version

The version is two decimal numbers separated by a period.  The current
version is 1.0.  If applications conforming to this specification
interoperates with other versions of this specifications, and it
encounters mnemonics that are undefined with this specification, it
shall leave the mnemonic as it is coded. This provides for upward compa-
tibility.

orig-charset

The orig-charset is given as the original character set name.  This may
be set by the sending User Agent before converting the message into a
character set suitable for transport.  If no orig-charset is specified,
the charset character set is used.

orig-Intro

The orig-intro character is given as the original intro character as
used by the originating User Agent. The orig-charset and orig-intro may
be used to recreate the message in its original encoding.  If no orig-
intro character is specified, the intro character is used.


Unknown options are ignored.

Examples of headers:

Content-Type: Text/Mnemonic ASCII 38
Content-Type: Text/Mnemonic ASCII 38 1.0 ISO_8859-1 38

Acknowledgements

This memo was inspired by [1],[7], and [8], as well as by conversations
with Justin Bur of l'E'cole Polytechnique Federale de Lausanne (EPFL)
and people active within l'Association Franc,ais d'Utilisateurs Unix
(AFUU), and les Re'seaux Associe's pour la Recherche Europeen (RARE).


Simonsen & Prindeville                                          [Page 4]


INTERNET-DRAFT            Mnemonic Text Format                 July 1991


REFERENCES


[1]
   D. Robinson, R. Ullman, ``Encoding Header Field for Internet Mes-
   sages,'' RFC 1154, April 1990.


[2]
   Nathaniel Borenstein, Ned Freed, ``Mechanism for Specifying and
   Describing the Format of Internet Message Bodies'', Internet draft
   June 1991.


[3]
   Keld Simonsen, ``Character Mnemonics & Character Sets'', Internet
   draft, July 1991.


[4]
   R. Blokzijl, ``RIPE: IP coordination in Europe'' in Computer Networks
   and ISDN Systems, Nos. 3-5, November 1990.


[5]
   ISO 646:1983 ``Seven Bit Code for Information Interchange''.


[6]
   ISO DIS 10646 ``Universal Character Set Code (UCS)'', ISO/IEC
   JTC1/SC2/WG3 N666, November 1990.


[7]
   M. Sirbu, "Content-Type Header Field for Internet Messages,'' RFC
   1049, March 1988.


[8]
   J.W. van Wingen, ``Networks and Coded Character Sets'' in Computer
   Networks and ISDN Systems, Nos. 3-5, November 1990.













Simonsen & Prindeville                                          [Page 5]