Re: 10646, and all that

Finally, there is a significant difference between the way that the ISO 646
family use the same codepoints to mean completely different characters, and
the way that DIS 10646 maps similar glyphs onto a single character...


The DIS does not say that the correnponding CJK characters are
the same single character. Instead, it says that the same code point
is assigned to the different "graphic symbols".

in the
latter case, the glyphs being mapped together do resemble each other.


Likewise, the code point for "!" in US ISO 646 (ASCII) should be
exclamation mark, not vertical bar, though the vertical bar resemble
to the exclamation mark.

Yes.  And I have no doubt that Japanese speakers will want their mail
readers to be able to display Japanese-style glyphs, and similarly for
Chinese, etc.  But suppose a message arrives with a content like:

content-type: text/plain; charset=ISO-10646-zh

(using the iso 639:1988 code for Chinese)

Perhaps the recipient can read Chinese, but his mail reader doesn't
have the font for ISO-10646-zh...it will refuse to display it.


If one can read English but his mail reader doesn't have the font for
US-ASCII, it will refuse to display it.

So what?

Or, practically speaking, if it has font for French version of ISO 646,
it will display US-ASCII with that font.

However, if the content is labelled:

content-type: text/plain; charset=ISO-10646; language=zh
...then the recipient's mail reader will be able to display it if it has
*any* 10646 font.


That is no different from the following situation. With

        charset=ISO-10646-zh

the recipient can set up his mail reader so that it can display it
if it has *any* 10646 font (it is not font variation, but anyway...).

Moreover, the recipient can set up his mail reader so that it can display
it with Chinese font if it has Chinese 10646 font.

Perhaps the Chinese characters will be displayed in
Japanese style...it will be more difficult, but the recipient will probably
be able to read it.  If this recipient gets a lot of Chinese mail, he might
add the Chinese version of 10646 to his system also to remove this
inconvenience.


Suppose if a Japanses and a Chinese shared thier environment in France,
which is not a so rare case.

If the language is encoded in a charset, and if the receiver does not
want to display the charset correctly, the receiver does not have to
provide support for the charset.


Agreed.  But it's far more likely that a mail reader will support one
version of 10646 (optimized for a particular language), than it is that the
mail reader will support every version of 10646 that might be needed.


With the environment where no one use Han, which is typical in Europe,
the mail reader do not have to support Han characters at all. It is
acceptable that if the all Han characters are displayed as a blank.
Then, you can save space for font.

With the environment where a signel variation of Han is used, the mail
reader have to support the variation. It may display other variation
in whatever form it want (or, precisely speaking, no one want it),
even as blank.

With the environment where two variations of Han is used, the mail
reader must support two variations of Han characters.

If the language is not encoded in a charset, if no language information
is provided otherwise, and if the receiver want to display the charset
correctly, the receiver can't.


No, but it can approximate it.


Then, we don't have to have "charset" at all. Almost all mails use
US-ASCII and, if not, one can approximate it.

But I don't want to require the
recipient's mail reader to support every possible variant of 10646.


It is the senders job to supply necessary information. If the recipient
does not want it, because the recipient is not the native user of the
language, the recipient can neglect the information.

But, don't forget that, in almost all cases (except for English mail with
ASCII), a mail in a specific language is exchanged between natives of
the language.

Can you explain why you are insisting that the language name be contained in
the character set?  What is the advantage of having the language name part
of the character set name, instead of as a separate parameter?


Because separation problem is ISO 10646 specific and not a general issue of
charset, it is absurd to introduce a new concept.

We don't have a clear idea on what is

        Content-language:

is. Several people said it is useful for spell checking and I said
it is not with ISO 639.

Moreover,

        Content-Type: charset=iso-10646
        Content-language: Chinese, Japanese

is completely meaningless for display purpose. But, how can you
documennt that?

So, do you want to introduce another confusion?

                                                Masataka Ohta