Re: ISO 2022 (Was: Re: The Swedish Initiative)

Dave Crocker writes:

Keld,

At 3:31 AM 11/19/94, Keld J|rn Simonsen wrote:

I do not think the MIME group gave up on that. There is a number
of character set specifications in the MIME main RFCs (1521 and 1522)


What the Mime group did was to standardize a mechanism for characterset
labeling.  It did not standardize any particular character sets.  The
various 8859 flavors are list as legal, but not required.  The same applies
for the follow-on charset values that have been defined and published.  But
the only thing that everyone is required is ASCII.


Well, I think MIME compliant applications are required to support
all the ISO-8859-[1-9] charsets too. How they support it is not
further specified.

Anyway, Dave, I take your point that we never got to standardise
a uniform character encoding for MIME for all  mails to go.
And I think IAB has made a statement that they are not going
to standardize on the character set and internationalization issues
for Intenet specs for a while. 

I think that this means that we are just going to experimentalize
for some years, and then see what is the outcome. It may also
be that others are taking over the initiative, being it ISO,
UNICODE or TERENA/RARE or others.

Well, Dave, you know that the original 822-ext list was set up to
discuss 822 with extended character set support.  Your statements


indeed it was.  for email.  and after more than a year, its response to
that requirement was charset=. What has changed to cause this group to
change the result of its previous work?


I do not think we should change the MIME labelling. I just think
that we may work further on the issue. And that there has been
done work in the field, already published as RFCs.

above are simply false.  I wish you would stop bashing international
issues - you should be neutral in the roles you hold within the


Keld, please pay closer attention to my statements.  Nothing in them has
'bashed' desires for international support, quite the contrary.  I HAVE
bashed the efforts to pursue the matter in this list a) beyond the scope of
this group,  b) without a detailed spec for consideration, and c) without
substantive new technologies and/or public adoption experience for the
group to base its work on.


I am not sure what you intend here. But:

a) The definition of a international character set scheme for
Internet mail is clearly within the scope of the ietf-822 list, as
it was the initial reason for setting it up.

b) I believe there are detailed candidates for consideration
for this, already published as RFCs:

1. Mnemonic
2. ISO-2022-INT-1
3. UTF-7

(taken in the order of publishing, according to my aging memory)
I see another candidate, namely SGML/HTML, but their proponents have not
argued this on this list.

c) Since the adaption of MIME in June 92 there is a sinificant new
technology emerging, namely 10646/Unicode. 10646 was adapted as
a full International Standard (ISO/IEC 10646-1:1993) in May 1993.
Also the Mnemonic technologies have been in worldwide use since
about 1990, and the ISO-2022 techniques (at least in a Japanese
implementation) has been in use for about 10 years. I am not sure
about raw 10646 use on the internet. (I consider mnemonic as
a "cooked" 10646 representation - mnemonic can encode 10646 in most
of the other character sets. )

I am not sure how to proceed. I take Dave's wish to move it to
another forum, preferably the ietf-charsets list, but anyway
I am happy to have it on the ietf-822 list and others (except that I
get 3 copies of each mail) as there is quite some discussion, and
I think almost all the interested people are present here. 
I also see a general charset discussion as a central MIME issue.

So should we work on a general scheme for representing
characters in Internet mail?

A few observations: 

1. What we have in MIME today, the ASCII and ISO-8859-? (not -10)
support is creating enclaves of localized areas where a certain charset
is spoken, and communications between these areas are cumbersome.
For example Western and Eastern Europe cannot communicate efficiently
although many of the characters are the same in iso-8859-1 and
iso-8859-2. The same applies for Turkey using iso-8859-9 and the
rest of Europe. Also this a a major problem in the Nordic Countries
with schools opting for iso-8859-10 and the rest for iso-8859-1.
The same problem will arise when we go to 10646, iso-8859-? and
ASCII conformant MIME systems (HW/SW) will not be able to handle 10646.

2. MIME *is* capable of handling universal schemes, viz. mnemonic,
ISO-2022-INT and UTF-7. 

3. Standard MIME creates a hostile environment when downgrading 
from 8-bit charsets to 7-bit ASCII - rfc-822 mail, as both
BASE64 and Quoted-Printable are considered unreadable in many
environments in their raw forms (read as plain ASCII). 


A recommended goal for the universal characer encoding scheme
for Internet mail would be that a feasible downgrading model
to existing Internet mail practices with as good as possible
usability for users be available and mandatory in the specification.
You can do things two ways: make a new completely different scheme
and then hope all people will convert to the new system, or make
a downwards compatible system that people with old software still
can cooperate with. I tend to favour the latter solution.
We have already seen a transition period of 2 years on MIME
and it is far from completed. This is partly because of a badly
engineered downgrading scheme (Quoted-Printable). Going from
the current goal of iso-8859-1 in my environment to a universal
scheme could create further problems, eg. 10646 encodings will
give problems just read as raw iso-8859-1, and that would also
be the case for ISO-2022-INT, while mnemonics and SGML/HTML
would be understandable as raw iso-8859-1.

The existing practises included should be  de jure rfc-822 and
MIME registered charsets, and de facto Japanese, national 646
and "just-send-8-bit" practises. I advocate also supporting 
defacto use as this will remove some problems that users (my
customers, they pay!) have, the way to do it would most likely
just be MIME charset-labelling. 

Our new scheme should be capable of handling intra-MIME charset
compabilities, so that we should not go thru the same painful
transtition as we do between 7- and 8-bit, when we go to
16 bit, and at some time again from 16 to 32 bit.

Keld