ietf-822
[Top] [All Lists]

Re: restrictions when defining charsets

1993-02-04 03:24:05
Dave:

Certainly, additional character sets can and should be registered with
IANA for use with MIME. But this is not a matter that needs discussion
among the Mime list.  IANA registration is a simple procedure.

As long as we are registering a character set, the resitration should
be simple.

The problem here is that "what is a character set and what is not?".

      charset=iso-10646-sanskrit-japanese-utf2

Well, perhaps IANA registration WON'T be simple, since you are raising some
issues to debate.  But again, I don't think that the topic is of concern
to the MIME list, though it certainly is important to make the charset
name be appropriate.

As "charset" is an idea in MIME, its meaning should be precisely
defined by MIME group, I think.

According to the following definition:

    The Working Group specified the definition of a character set
    for the purposes of quad-x to be a unique mapping of a byte
    stream to glyphs, a mapping which does not require external
    profiling information.

"charset" should provide all the profiling information to uniquely
map a byte stream to glyphs.

Thus, bare Unicode, which can't map some Devanagari and some Han correctly,
can't be a "charset".

The term "correctly" here means that native users of languages covered
by the "charset" won't find any difficulty in reading the resulting glyph
representaiton.

From: ayers(_at_)mv(_dot_)us(_dot_)adobe(_dot_)com

   But, assuming that the only language dependence of Unicode 
   is to Devanagari and to Han, we might be able to register ...

I recall a poster noting the language dependence of accented vowels,
e.g. between English and German ...

Some said that English diaeresis and German umlaut should be
distinguished because they have different MEANING.

But, if the requirement is "a unique mapping of a byte stream to glyphs"
and if diaeresis and umlaut share the exactly same glyph (I don't have
enough cultural background to judge that), it is not necessary for a
"charset" to distinguish them.

                                                Masataka Ohta