An initial motivation for this working group's efforts was to
develop an enhancement to electronic mail which would comfortably
support languages other than English, by extending the set of
characters that could be transmitted. As is often true with such
projects, the bulk of the effort has been devoted to fundamental
enhancement of the email content-structuring infrastructure, with
the resulting, explicit support for multiple character sets being
a relatively small part of the new specification.
Yes, the support for multiple character sets were the initial
goal of this effort. And we have reached consensus on it.
Why do you throw a bomb like this into the debate in the
But there has been significant discussion on the topic of
character sets and, I believe, extremely useful education of the
technical community about the issues. Unfortunately from the
discussions, I have come to one conclusion that I find
Support for international character sets is a messy
problem, for which there is no clear solution and
apparently no significant field experience even with
For example, there are multiple relevant international standards,
and no clear basis for believing that any one of them dominates.
Worse, there even is a standard which apparently only serves the
purpose of, itself, selecting different character sets from other
specifications, just as RFC XXXX's charset= attribute allows.
Well, this is what character sets are all about. They cover
different languages of the world - because the world has different
languages and different characters are being used for those languages.
What you are saying is that the world is messy, and we should not try
to accomodate it.
I would agree with you that the world is a mess, but then there has been
brought some order to its use of character sets. This is done
by ISO - the ECMA registry, and the issue of different character sets
is well defined, it is a full ISO standard, ISO 2375.
It may be confusing to you, perhaps because of the volume of it, but it
is still well defined. It is not confusing to me and a lot of other people.
Then I am confused about video formats etc, but I realize that I am
not an expert on video formats, and I then let the experts do
the specifications on that subject.
To the experiences with multiple character sets: well there have
being consideral experience with this in Europe within EUnet, for
more than a full year. So this has more experience behind
it than the rest of RFC-XXXX where the specs is about to be
implemented. To me it sounds like you are saying that the mechanisms
in the new RFC-XXXX has not been tested, so we cannot make it a
RFC. But this will be the case with any new RFC.
One idea to remedy this situation is to first promote RFC-XXXX
as an experimental RFC, and then based on the experience with it
promote it to proposed standard and internet standard in due time.
By any reasonable measure, the topic of communicating
information in an environment that supports multiple
character sets MUST be viewed as exploratory and
inadequately-understood. In Internet parlance, I take
this to mean that any specification of character set
detail MUST be considered to be Experimental.
Electronic mail on the Internet requires a level of
interoperability that is unique, since mail objects traverse a
much larger space than the IP Internet. Further, levels of
implementation compliance tend to be poor. Therefore, I feel that
it is essential that there be a reasonably strong basis for
believing that the Internet understands how to use multiple
The same can be said for the other new concepts in RFC-XXXX.
However, RFC XXXX also contains significant reference to details
about SPECIFIC character set specifications. I believe that
virtually all such references should be removed, since they refer
to specifications which apparently have little or no concrete
experience and about which there is no general, strong community
sense of comfort. In other words, there is no reason to believe
that the character set mechanisms that are cited will be
sufficient or will be used in the real world, in spite of the fact
that some of the citations are for documents on the international
I think you are using quite strong wording here without having
sufficient backing information. It is the consensus of this
list and the WG that the charsets specified in RFC-XXXX are adequate
for the use of internet mail. Why have you not expressed
your concerns earlier? Where do you see the problems?
The rest of this note discusses RFC XXXX details:
In section 7.1.1, The charset parameter, the text contains an
italicized note which begins "Beyond US-ASCII..." and offers a
view of engineering preference, as well as stating a belief about
the long-term outcome. It includes the sentence "This future ISO
10646 standard will probably provide the best means for universal
text representation." The next paragraph acknowledges that the
spec is not complete. It is my understanding that that area of
work is very much in flux. It therefore seems, to me,
unreasonable to anchor RFC XXXX to that specification. When 10646
gets enough experience and demonstrates its leadership position,
then the Internet can specify its use. At the moment, however,
the field still appears to be open.
The 10646 is indeed not yet ready, and care should be taken to
not place too much emphasis on this.
The rest of section 7.1.1 goes into detail about specific,
character-set related specifications, including ISO-8859-X, ISO-
2022-jp, ISI-10646, and MNEMONIC. 10646 apparently is at DIS
level. 2022 is a full standard, but is only a means of switching
to character sets rather than, itself, specifying a character set.
The status of 8859 is not clear, from the References section of
RFC XXXX. And MNEMONIC is a brand new spec, from the Internet
The ISO 8859 standards are full ISO standards. Please do not state
your own misguided opinions as facts.
For one thing, the mere presence of such a large set of
alternatives ought to give one pause and further ought to suggest
that no specification should tie itself to any of these documents,
individually or collectively. RFC XXXX should let the character
set area progress at its own pace and should wait for its dynamics
to settle down.
There are full ISO standards on character sets and that means that
they have been brought to a completion that is more settled than
any Internet standard will ever be. Come on! The character sets are well
defined and it should be possible to handle them in an Internet RFC.
I would say not being able to handle more than ASCII in RFC-XXXX
in a well-defined way will be a SHOW STOPPER to me and
probably most other Europeans.
The discussion of ISO-2022jp includes "It appears necessary to
explicitly specify the ISO-2022 methods that will be permitted in
text mail so as to avoid the need for private agreements about,
e.g., the specific character sets being used in message. IT IS
EXPECTED THAT THOSE INTERESTED IN ISO-2022 MAIL WILL DEVISE AND
PUBLISH SUCH A SPECIFICATION IN THE FUTURE." (emphasis mine.)
In other words, ISO-2022 is not yet usable.
The definition of ISO-2022-JP is as far as I can see complete,
it defines the usage unambigously, and there is consensus in this
WG that the specs are OK.
Discussion of ISO-10646 and MNEMONIC is prefaced with the
statement "The use of the following... is expected to be defined
by forthcoming documents."
In other words, use of 10646 and MNEMONIC is, at this
point, purely speculative.
MNEMONIC is defined in RFC-CHAR obtainable as
internet-drafts-822ext-charsets-01.txt obtainable at your local
internet-drafts provider. It has been around since July this
year and the new draft is very compatible with the old draft,
mostly reflecting decisions of the WG in Santa Fe.
The statement about "forthcoming documents" was misleading for MNEMONIC.
Appendix F contains detail about current Japanese use of 1022, but
it also states that it expects to be to superseded by a more
formal specification. The fact that this appendix is only
informational, refers only to use by a specific community, and is
expected to be replaced (soon?) strongly suggests that the is not
appropriate content for a standards specification.
I think this is a problem with the wording in RFC-XXXX.
I beleive the specification of ISO-2022-JP is formal enough
for Internet usage. I would rather use the reference in RFC-CHAR
which is a document on charsets, and have more-or-less the same
wording on ISO-2022-JP.