OK, here are exerts of the most recent EUnet bb meeting where
we talked about Mnemonics. This is exerts taken from the EUnet minutes
repository at mcsun.eu.net .
EUnet Backbone Managers
Minutes of Copenhagen Meeting
16th, 17th and 18th March 1991
Attendance: (29 people)
...
5.0 Voting Points and Decisions
....
5.10 All backbones are strongly encouraged to install the
DKnet Sendmail patches on character set handling very
soon (if possible) when available for the most recent
version of sendmail. Keld is strongly encouraged to get
the patches included in the official release and write
documentation on its use. Motion passed, 2 abstentions.
....
Appendix A
Report from the character sets in mail WG at the Copenhagen BB mmeeting
Keld Simonsen
Character set issues are a growing concern in EUnet, with a lot of vendors
providing 8-bit support. Keld Simonsen has made an implementation for
sendmail+IDA 5.61, 5.64 and 5.65 covering many character sets and
characters.
The initial implementation was done in February 1990, and at Breukelen - 2
April 1990 it was decided that all Eunet backbones should run these patches
to sendmail (if possible).
An article about this appeared in the EUUG newsletter winter 1990/91. The
patches are currently installed at the backbones DK, FR and GR. FI plans to
install the patches quite soon.
Recently there have been extensive discussion within IETF on the subject,
with an IETF meeting in St. Louis, USA 1991-03-11/15 with discussion on the
subject, and Nordunet has also discussed the subject. Nordunet has decided
to take another approach to at least the MTA - MTA communications (an
approach incorporating ideas that was also included in the DKnet scheme).
Unfortunately there seems to be no implementation of this scheme yet, and
the initial implementation would likely not include support for more than
LATIN1, which is insufficient for EUnet purposes. Nordunet will further
discuss the issues of character sets in mail 1991-03-20/22. Keld has
written a draft RFC on parts of the current DKnet scheme.
RARE is also discussing the issue, but is at this point rather late in the
issue.
The EUnet plenary proposed to have the patches included in the de facto
sendmail distribution from U. of Illinois.
Marius, Martijn, Petri, Hans Petter and Keld participated in the WG. The WG
discussed the DKnet solution and was very concerned about IBM Codepages on
the lines, this should be avoided whenever possible. Only standard
character sets should allowed. The WG had no conclusion to whether to
restrict this to only 10646 (the new big standard character set) and ASCII,
or to also allow 8859 character sets. It was agreed that when 8-bit was use
on the line, it should be negotiated.
Iceland has the opinion that all users should be enforced to run the same
character set, both in mail and internally on all machines. Others said
that they would have great difficulties in enforcing such a policy. The
DKnet implementation supports both ways of doing things, leaving it to a
policy decision of each backbone how to do it.
The WG asked for more documentation on the actual implementation and how to
use it for configuration.
The WG decided to strongly encourage all backbones to install the DKnet
patches very soon.
FI and NO had problems with this as they are both EUnet and NORDUnet
backbones, and would like to see what NORDUnet decides. As the patches
seems to be quite inexpensive in performance penalties, they do not really
cause any harm and people were confident in just installing them.
Later comment: Keld intends to make a mail list on the subject.
Multi character set support in sendmail
The patches makes support for more character sets than ASCII available in
sendmail. About 90 7- and 8-bit character sets are supported, including the
ISO 8859 character sets, PC codepages, and Mac character set. Actually
almost all of the so-called ECMA collection of character sets is covered,
plus quite some vendor defined character sets.
The patches can be used to support local equipment with character sets other
than ASCII, and then in the transmission out of the machine, the message is
converted to ASCII, so full interoperability and transparency is
guaranteed. It should be noted that sending out 8-bit characters on the line
in SMTP is a violation of current RFC standards and it will create quite
unintelligible messages on many systems.
The conversion to ASCII is done into a mnemonic representation so a
recipient without special software is able to read the message. This
conversion is fully reversible without information loss, so if the receiver
has sufficient software and hardware, the message can be displayed in the
right way.
A conversion program 'conv' usable as a filter, e.g. together with a pager
in a mailer, is also available, providing for easy rudimentary support when
the hardware is available for the recipient.
When mutual agreement has been achieved between two communicating systems,
these patches can also be used to transmit 8-bit character sets. This is
useful for instance between UUCP sites, or with sites that just send 8-bit
mail out over SMTP.
Installation of character set support patches to sendmail
The patches have been used for sendmail 5.61, 5.64 and 5.65. They will most
likely also work on later sendmail versions.
IDA patches are required.
The sendmail patches requires prior installation of the character set
conversion package. This currently consist of two shar files: ch.shar01 and
ch.shar02. Create a source directory for these and unshar, make and make
install the shar files. Default installation is in /usr/lib/char, /usr/lib
and /usr/include, so you need sufficient privileges for these catalogues.
The 'conv' program is not installed anywhere, but could be copied to
/usr/local/bin or similar places for public access.
Then apply the sendmail patches to the source of sendmail, and remake and
reinstall sendmail.
Configuration
In the ida/cf/Sendmail.mc file you should define the local mailer to use the
internal character set of your machine. If this is e.g. ISO 8859-1 (LATIN1)
it could be done by:
Mlocal, P=/bin/mail, F=DFMSlmnrs, A=mail -d $u, C=LATIN1, X=29
Valid character set names, and what encoding they cover, is specified in the
character set conversion package in the file CHARSETS. The character set is
specified with the "C=" option on the Mailer specification. Case is not
significant.
Another parameter is the compose character, which is used to specify
out-of-band characters. This can be "&" or a control character, for instance
ASCII decimal 29, which has the advantage of being invisible in 'more' and
'page' pagers (but not in 'less'). The compose character is given by the
option "X=" and then the decimal ASCII (or other character set) value.
If you have special arrangements for instance with UUCP or SMTP sites
hooking up to you, you should provide mailers for these in the Sendmail.mc
file, like:
MUUCP, P=/usr/bin/uux, F=CDFMUVSpu, S=19, R=19, A=uux - -z -r $h!rmail ($u
),C=ASCII,X=29
MUUCP-L1, P=/usr/bin/uux, F=CDFMUSpu, S=19/0, R=19/0, A=uux - -z -r $h!rmail
($u),C=latin1,
X=29
MUUCP-850, P=/usr/bin/uux, F=CDFMUSpu, S=19/0, R=19/0, A=uux - -z -r $h!rmail
($u),C=CP850,
X=29
MTCP, P=[IPC], F=CDFMXhnmpu, E=\r\n, A=IPC $h,C=ASCII,X=29
MTCP-L1, P=[IPC], F=CDFMXhnmpu, E=\r\n, A=IPC $h,C=LATIN1,X=29
MTCP-850, P=[IPC], F=CDFMXhnmpu, E=\r\n, A=IPC $h,C=CP850,X=29
Then in the /usr/lib/mail/mailertable for the separate sites declare the
appropriate mailer for the site.
Modifications to IDA sendmail consists of:
1. New mailer definitions: C for character set, and X for escape char in
decimal. Appropriate changes to headers.c and sendmail.h .
2. in collect.c the appropriate mailer for the incoming mail is found and if
the header "X-Charset" or "X-Char-Esc" is found, these are used instead. The
header is untouched. The body is converted from the found charset to Ascii,
which is the internal char set of sendmail.
3. in deliver.c the receiving mailer is determined and the corresponding
charset is found. This is written with two new headers, or if they exist,
the old headers are revised. Then the body of the message is converted from
Ascii to the receiving character set.
4. A new routine findhead() has been added in headers.c to be able to get
and modify a header.
5. The routine sfgets() in util.c has been changed to allow 8-bit data.
Keld(_dot_)Simonsen(_at_)dkuug(_dot_)dk 1991-03-24