I really have to strongly disagree with Greg's position. 2022, 10646, etc
really *have* to be treated as character sets and not content types.
I agree with most of this. With one exception, which is that the
phrase "2022, 10646, etc" is meaningless.
To deal with the easy problem first, 10646 isn't a character set, it
isn't a standard, it isn't anything other than an idea and a proposal
that is still undergoing change. But, assuming it eventually emerges as
an International Standard, there is every reason to expect that it will
be a character set.
2022 isn't a character set. It never has been, and never will be. It
is a collection of "code extension techniques".
(or any other content type) that uses 2022 the way ASCII is used now.
To use ASCII the way ASCII is used now, you need only to specify
"ASCII". To use ISO 8859-1 the way ASCII is used now (at least in an
8bit environment), you need only to specify "iso_8859-1". These are
character sets. Presumably, to you 10646 the way ASCII is used now, you
will only need to specify "iso-10646" although, as if 8859-1, you may
need to specify further encoding to use it in a 7bit (or even an 8bit)
How, for example, would you send something of type "Makefile" that uses
japanese characters encoded using 2022?
I agree that it is reasonable to handle it. I still haven't seen a
serious proposal for doing so. The problem is that the phrase "japanese
characters encoded using 2022" isn't a character set in the above sense.
It is an abbreviation for three separate pieces of information:
(i) ISO 2022 for code extension techniques
(ii) A very specific profile about which features of 2022 are used and
how. This profile contains the notion--which 2022 does not provide
for--of an initial character set.
(iii) A list of registered character collections which will be
involved using 2022 conventions.
If we see a way to specify triples like that, or if we, by fiat,
provide for the two (that I know of) Japanese encodings that use 2022 as
character sets that happen to use 2022 as part of their specifications,
then the result can be treated as "character sets". 2022 isn't.
On the same theme, why treat well specified 2022 character sets any
differently than iso-8859-1?
Because they contain control sequences that affect the semantics of
the message. Whether that is a big deal or not depends on whom you
listen to and what religion they subscribe to.