ietf-822
[Top] [All Lists]

Re: 10646, and all that

1993-03-10 22:35:51
To:  ietf-822(_at_)dimacs(_dot_)rutgers(_dot_)edu
Subject:  Re: 10646, and all that
Date:  Wed, 10 Mar 93 17:29:24 JST

Or, are you saying that the following specification:

  Content-type: text/plain; charset=ASCII
  Content-language: French

is better than:

  Content-type: text/plain; charset=ISO646-FR


Yes!  Actually I prefer a language= parameter, but I definitely want
to separate the language specification from the character set.

That's contradicts with the current policy to use the name US-ASCII
to the only true ASCII.

Not really.  ASCII has always been the *American* Standard Code for
Information Interchange...even though the name ASCII has been applied to
other ASCII-like character sets.  The "US-" that MIME insists on is in some
sense redundant, but it seemed like a good idea to add the US- to make it
more obvious that an ASCII-like set that is not ASCII should not be labelled
as ASCII in MIME mail.

Note that "US" isn't a language, and use of charset=US-ASCII doesn't even
mean that the body part contains English text.  Likewise, the charset
ISO-2022-JP could be used with some other languages besides Japanese (such
as English!).

Finally, there is a significant difference between the way that the ISO 646
family use the same codepoints to mean completely different characters, and
the way that DIS 10646 maps similar glyphs onto a single character...in the
latter case, the glyphs being mapped together do resemble each other.

If the language is combined with a charset, then a decent mail reader
has to support LOTS of charsets.  Especially when you start listing
combinations of languages -- say, to indicate a mixed English/Japanese
text. 

If the language is encoded in a charset, and if the receiver want
to display the charset correctly, the receiver should provide support
for the charset.

Yes.  And I have no doubt that Japanese speakers will want their mail
readers to be able to display Japanese-style glyphs, and similarly for
Chinese, etc.  But suppose a message arrives with a content like:

content-type: text/plain; charset=ISO-10646-zh

(using the iso 639:1988 code for Chinese)

Perhaps the recipient can read Chinese, but his mail reader doesn't
have the font for ISO-10646-zh...it will refuse to display it.

However, if the content is labelled:

content-type: text/plain; charset=ISO-10646; language=zh

...then the recipient's mail reader will be able to display it if it has
*any* 10646 font.  Perhaps the Chinese characters will be displayed in
Japanese style...it will be more difficult, but the recipient will probably
be able to read it.  If this recipient gets a lot of Chinese mail, he might
add the Chinese version of 10646 to his system also to remove this
inconvenience.

If the language is encoded in a charset, and if the receiver does not
want to display the charset correctly, the receiver does not have to
provide support for the charset.

Agreed.  But it's far more likely that a mail reader will support one
version of 10646 (optimized for a particular language), than it is that the
mail reader will support every version of 10646 that might be needed.

If the language is not encoded in a charset, if no language information
is provided otherwise, and if the receiver want to display the charset
correctly, the receiver can't.

No, but it can approximate it.  And I'm assuming that users of 10646 will
want their mail composers to supply language information along with the
content so that their message will be displayed optimally if the recipient's
mail reader is capable of doing so.  But I don't want to require the
recipient's mail reader to support every possible variant of 10646.

Can you explain why you are insisting that the language name be contained in
the character set?  What is the advantage of having the language name part
of the character set name, instead of as a separate parameter?

Keith


<Prev in Thread] Current Thread [Next in Thread>