Re: Response to MIME charset issue

Below you will find a response to the issue raised concerning whether ISO
10646/Unicode meets the MIME requirements for a charset which may be
registered. This analysis was prepared by John Jenkins of Taligent with
assistance from myself, Lee Collins, and Mark Davis (all also of Taligent),
as well as assistance from Nathaniel Borenstein and Ned Freed, the authors
of RFC 1521.


I'm not interested in playing with terminologies.

The fundamental issue here is whether or not Unicode and ISO/IEC 10646
define what MIME considers to be a "charset."  The relevant MIME language
is:

"This RFC specifies the definition of the charset parameter for the
purposes of MIME to be a unique mapping of a byte stream to glyphs, a
mapping which does not require external profiling information."

The terms "character" and "glyph" have specific meanings within ISO usage.


As MIME is not an ISO standard, it is free from ISO debate on terminologies.

With definition which is meaningful in writing real world applications,
ISO 10646/UNICODE does not give unique mapping from code to glyphs.

Unicode, 10646, and other ISO character set standards as well as important
national standards such as JIS 0208 explicitly avoid limiting the set of
glyphs which can be used to render the characters they encode.


As many of the glyphs of C/K Hans are considered to be WRONG by people
in Japan, it is not the issue of legibility.

This is particularly important for ISO/IEC10646, which uses up to four
glyphs to represent characters in its unified East Asian ideograph set.


So, you admit glyphs of C/J/K are diferent, don't you?

Nor is the fact that 10646 allows the use of combining marks relevant.
Combining marks are necessarily a part of the encoding of various South
Asian and semetic languages.  If the issue is the ability to render text
intelligibly as opposed to rendering text exactly, then any Level 3
implementation of 10646 will be able to provide appropriate rendering.


Wrong. As what is the "appropriate rendering" is undefined and may vary
language by language, no implementatiion can do so.

If it is the intent of MIME to lock users into the specifics of the
bit-layout on the screen or on the page, then no current ISO character set
standard is a "charset," and only a glyph registry such as ISO/IEC 10036
could qualify.


The minimum requirement here is correctness.

We have consulted Ned Freed and Nathaniel Borenstein regarding the intent
of the language within MIME.  Excerpts from their responses follow:


What? It's me who put the phrases about charset and glyphs into MIME
document.

So, you can simply ask me.

Ned Freed:
------------------
The intent [of MIME] here is pretty simple: Given the sequence of bytes in
the body part and the charset value, it must be possible to display the
message in the fashion the message creator intended.


As many of the C/K variation of glyphs of UNICODE is considered to be
wrong by people in Japan, the rendering result of Japanese message with
UNICODE can not be displayed in the fashion the message creator intended.

Nathaniel Borenstein:
-------------------
If anyone thinks that Unicode can't be a MIME character set because of
something RFC 1521 says, then RFC 1521 is wrong.  Period.


I agree. RFC 1521 is wrong. MIME is wrong too.

Our intent very
specifically included being INCLUSIVE of the then-emerging Unicode/10646
standard.


It depends on what "Our" means. Though such expectation for UNICODE was
phrased in some of the drafts of MIME, it was removed. So, it has not
been an intent of us, the 822ext WG.

Determining the appropriate ISO language for MIME to use is difficult,
because ISO currently lacks the formal concept of "minimal legibility."  It


It, at least, rejects incorrect rendering.

is, however, true that any system supporting Level 3 ISO10646 or Unicode
can intelligibly render plain text in the absence of any further
information.


Yes, it can, as long as the incorrect result is acceptable, which is not.

                                                Masataka Ohta