ietf-822
[Top] [All Lists]

SWEDISH CHARACTERS IN EMAIL: THE SUNET INITIATIVE

1994-11-16 01:55:43
SWEDISH CHARACTERS IN EMAIL: THE SUNET INITIATIVE

The Board of Directors for the Swedish University Network, SUNET, has
started a project to deal with the problem of Swedish characters in
electronic mail. The current situation, in which several different
character sets are used simultaneously, is clearly unacceptable.

The main problem concerns the last three characters of the Swedish
alphabet, ("a with ring", "a with umlaut" and "o with umlaut") which are
often displayed incorrectly on the computer display of the mail message's
recipient. There are two reasons for this:

1. The sender and the recipient use different character sets* (Swedish
   7-bit*, Latin-1*, Macintosh, PC, etc.)

2. A program that handles transportation of electronic mail has destroyed or
   changed the letter's contents to something which cannot be interpreted by
   the recipient's mail program.

   A typical symptom of this -- when the recipient sees "EDV" instead of
   Swedish characters -- results when a mail transportation program which
   handled the letter on its way from the sender to the recipient changes
   the high bit from one to zero in every octet/byte. This behavior is
   nevertheless completely in accordance with the SMTP* standard which is
   used in SUNET and the Internet.

After a meeting of electronic mail support personnel from Swedish
universities on the 28th of September, and after consultation with the
SUNET Technical Advisory Group, SUNET has decided to recommend that
starting on the 1st of January, 1995, all electronic mail communication
outside an individual organization should conform to the MIME* standard for
electronic mail in the Internet. Electronic mail sent within an
organization ought to conform to this standard, too.

The character set to be used for Swedish text is Latin-1. The Swedish
characters are represented by high octets* in this character set.

According to the SMTP standard for electronic mail, which has been in use
since the beginning of the 1980's, high octets should never be used. This
is still the case; under NO circumstances should high octets simply be
transmitted 'as is' on the network. The preferred solution is to implement
ESMTP*, the extended version of SMTP. This protocol allows transmission of
high octets only when the receiving system confirms that it is capable of
handling letters which contain such. Alternatively, letters containing
Latin-1 text can be encoded before transmission using the 'Quoted
Printable'* encoding process described in the MIME standard.

This means that the former SUNET recommendation to use the Swedish 7-bit
character set in electronic mail will no longer be valid after January 1st,
1995.

On the other hand, we do not consider it appropriate at this time to
recommend MIME for use in Internet News* - an official Internet
recommendation for this does not now exist and few programs for reading and
creating articles using MIME are available. The recommendation that the
Swedish 7-bit character set be used for News is still in effect until
further notice. (High octets should not be sent in News articles.)

It is no doubt unsatisfactory that two fundamentally different methods for
the representation of Swedish text are used in electronic mail and News,
especially in consideration of the close relationship between these two
services. SUNET therefore wishes to stimulate discussion on how the problem
with representing Swedish characters in News can be solved. The discussion
will be conducted in the News group swnet.mail.

SUNET intends to recommend the general use of the Latin-1 character set in
plain text files and HTML* files provided by Gopher*, World Wide Web (WWW)*
and anonymous FTP* services. Also Gopher menus and titles of WWW pages
should use Latin-1. A discussion on the advisability of this will be
conducted in the News group swnet.mail.

SUNET plans to evaluate electronic mail programs in the Macintosh, MS
Windows and UNIX environments in order to ease the transition to MIME.  The
evaluation will be confined to a program's MIME compatibility and
usability. SUNET will recommend suitable programs afterwards.

SUNET will also continue development of the EMIL electronic mail conversion
system in order to improve its functionality and ease of installation and
configuration. Using EMIL it is possible to provide MIME support in
environments where a transition to MIME cannot be accomplished within the
given time frame.

More information about MIME and in particular this project is available via
the World Wide Web. The URL is:

        http://www.nada.kth.se/sunet-mime/

Some of the documents can also be acquired via anonymous FTP to
<ftp.nada.kth.se> from the directory "pub/sunet-mime".

SUNET recommends that further discussion concerning this project be
conducted in the News group swnet.mail. Questions and suggestions can be
sent to the project members at the electronic mail address
<sunet-mime-info(_at_)sunet(_dot_)se>.


GLOSSARY
--------

Character Set: A complete set of rules for how different characters are
represented in a computer using different combinations of bits (quantities
that are either 0 or 1).

Swedish 7-bit Character Set: The character set for Swedish text which
became popular in the beginning of the 1980's. It is similar to the
American ASCII character set except that the braces, brackets and some
other special characters are substituted by the Swedish diacritic
characters. It is a Swedish standard with the official name of
'SEN_850200_B' within MIME. It is also informally referred to as "Swedish
ASCII".

Latin-1: The character set that will be recommended for use within SUNET.
It is already used in Microsoft Windows and by many UNIX computers. It is
twice the size of ASCII and the Swedish 7-bit character set and contains
not only the entire ASCII character set but also all diacritic letters and
similar characters used by western European languages. It is an
international standard with the officially registered MIME name of
'ISO-8859-1'.

High Octets: Octets (bytes) in which the highest bit is a one (1). All
information in a computer is stored as combinations of zeros and ones,
bits, often handled in groups of eight called octets or bytes. 256
different combinations are possible with eight bits and are commonly
referred to by the numbers 0 to 255, inclusive, in which the high octets
have values between 128 and 255. Latin-1 contains 256 characters since
every character is represented by a different octet. The Swedish
diacritical letters are represented by high octets in Latin-1.

Quoted Printable: A method, defined in MIME, of temporarily representing
high octets as low octets during transport. The high octet uppercase
Swedish diacritic characters ("a with ring", "a with umlaut" and "o with
umlaut") are represented in this system as '=C5', '=C4', and '=D6' and the
lowercase as '=E5', '=E4', and '=F6'.

SMTP: Simple Mail Transfer Protocol. The fundamental standard used for
electronic mail in SUNET and the Internet. It is defined by the Internet
document RFC* 821.

MIME: Multipurpose Internet Mail Extensions. An extension of SMTP (and
other electronic mail standards) which describes how characters not
included in ASCII and multimedia information can be transmitted in the
Internet. MIME is defined in the Internet documents RFC 1521 and RFC 1522.

ESMTP: Extended Simple Mail Transfer Protocol. A modification of SMTP which
enables transmission of high octets in electronic mail. This is
accomplished by using the 'EHLO' command in combination with the 'MAIL
FROM' parameter 'BODY=8BITMIME'. This standard is defined in the Internet
document RFC 1652.

RFC: Request For Comments. A series of technical documents written during
the evolution of the Internet. Among other things, all communication
protocols in the Internet are defined in different RFCs. These documents
are free and are available from many computers in the Internet, including
<sunic.sunet.se>.

Internet News: The first world-wide, fully-open, computer conferencing
system. Discussions are divided into thousands of different interest groups
called News Groups. All users of the Internet can read News articles as
well as post their own.

World Wide Web (WWW), Gopher, Anonymous FTP: Different methods for reading
and acquiring information, graphics, programs and so forth that are
available in the Internet.

HTML: HyperText Markup Language. The document format normally used for
information that is provided via the World Wide Web.

(This message has been sent in Swedish to the mailing lists:
sunet-mime(_at_)sunet(_dot_)se, nordpost(_at_)nada(_dot_)kth(_dot_)se, 
sunstyr(_at_)vhs(_dot_)se, tref(_at_)vhs(_dot_)se as
well as posted to swnet.general, swnet.mail, swnet.siren,
nordunet.talk.skandinaviska, dk.general, soc.culture.nordic. 
It has been sent in English to the mailing lists 
ietf-822(_at_)dimacs(_dot_)rutgers(_dot_)edu, wg-msg(_at_)rare(_dot_)nl, 
swede-l(_at_)cmuvm(_dot_)csv(_dot_)cmich(_dot_)edu,
as well as posted to the News groups soc.culture.nordic and
comp.mail.mime.)