The current version of the MIME document (i.e. mime2) says:
The term "character set", wherever it is used in this
document, refers to a unique mapping of a byte stream to
glyphs, a mapping which does not require external profiling
information. For example, bare "ISO 10646" can't be the
charset parameter, because it requires several language
information for the unique mapping to glyphs. However, this
term can refer to multibyte character sets and to extension
techniques such as those used in ISO 2022.
and later:
This RFC specifies the definition of the charset parameter
for the purposes of MIME to be a unique mapping of a byte
stream to glyphs, a mapping which does not require external
profiling information. For example, bare "ISO 10646" can't
be the charset parameter, because it requires several
language information for the unique mapping to glyphs.
Why has the stuff about bare 10646 been added to the document? As far
as I can see, there was no consensus *at all* concerning this issue on
this mailing list.
The term "glyph" is used exactly 4 times in the whole document, and
all 4 of those occurrences are in the material quoted above, but there
is no definition for this term, nor is there any pointer to a document
that defines the term.
However, I *suspect* that MIME's "charset" parameter is not intended
to indicate which glyphs are being represented in the message. Since
this is only a suspicion of mine, I would like to hear what everybody
else thinks.
There are several ways to write the letter "a", including:
*** ***
* * * *
* * *
**** * *
* * * *
* * * *
*** * **** *
These two are different glyphs, but they are the same character. (The
terms "glyph" and "character" have been the subject of lots of debate,
especially in the ISO/IEC JTC1 SCs 2 and 18, but I *think* everyone
would agree about the above example.)
There seems to be a consensus in this group that us-ascii, iso-8859-1
and iso-2022-jp are MIME "charsets".
The first two, us-ascii and iso-8859-1, are what ISO usually calls
"coded character sets". The last one, iso-2022-jp, is actually a
well-defined combination of 4 coded character sets (typically, only 2
are used in any one message).
So, in my view, a MIME charset is *not* a glyph encoding. Well, what
*is* a charset, then? That's the hard question.
John's suggestion of writing a separate document about the kinds of
things that can be registered with IANA as "charsets" seems OK, but I
can't help thinking it would be nice if there were some guidance in
MIME itself, in much the same way that text subtypes are explained.
We could have some prose that explains the intent of The Three Rules,
and it would be up to IANA whether or not to register a particular
proposal. So far, IANA has not been very strict. I'm not sure
whether that's a good thing, but then it may just be the Internet Way.
Anything can be registered, but to succeed, it has to prove itself in
the field.
Erik