Re: Character-set header (was Re: Minutes of the Atlanta 822ext meeting)

Date: Thu, 29 Aug 1991 11:42:42 -0400 (EDT)
From: Nathaniel Borenstein <nsb(_at_)thumper(_dot_)bellcore(_dot_)com>

[...]

The biggest problem I have with this header is that it will, much of the
time, be meaningless.  I have a problem with headers that are defined
because their meaning is semantically crucial to the message, but which
have no rational application.  Let's consider the 9 message types
defined by RFC-XXXX:

text -- character set info is meaningful, either as a subtype OR using
"Character-set."

[...]

text-plus -- character set info is potentially meaningful, but
problematic because many (most?) rich text formats already provide their
own mechanism for encoding multiple character sets anyway.  If you have
a text-plus format that has its own mechanisms for specifying character
sets, is the Character-set header simply ignored?  Ugh.


For text-plus, the character-set (however specified)  could simply be
the "default" in the absence of a specification within the text-plus
body part.  This could be overridden within the body part 
(say with <iso8859_1>gibberish<\iso8859_1> ) or could simply be ignored
for content-types for which character-set is meaningless.

Now, given these facts, what does it mean to a UA if it sees a
character-set header?  It means that you have to look at the
Content-type header.  If that happens to be "text", the meaning is
obvious, but otherwise it is at best confusing and at worst totally
undefined.


I think the rule is: Look first at the content-type header.  It tells you
what kind of object the body part contains, and in addition it contains all
of the parameters that must always be specified for that particular
content-type. Other parameters may be specified as well; these are in other
headers.

So the fact that a character-set header exists for a body part doesn't mean
anything unless the specification for its particular content-type defines
the meaning of a character-set header.  If a body part header is included
that isn't defined in the specification for that content-type, it should be
ignored.

But if the Content-type has a critical impact on the
semantics of the character set specification, why shouldn't it be
specified as part of the content-type?  And if only one (or even 2 or 3)
 content-type can sensibly have character set information, why not make
that information part of the content-type for that (those) specified
type(s)?

I have yet to hear any clear reason why the "text/char-set" model is
inadequate, and I find the addition of a Character-set header
potentially very confusing.  I would strongly advocate that we return to
using text subtypes to specify character sets.  This is close to being a
showstopper for me, though I'm trying to keep an open mind.


The only problem that I remember about text/char-set is that sometimes
(but not always) you need a char-set for text-plus also.  You could
make character-set a required parameter (and therefore part of the
content-type) for text, but an optional parameter for text-plus, but that
seems likely to cause confusion.

Of course, if we use extra headers at all, UA implementors have to invent a
mechanism to pass them to the programs defined in the mailcap file.
(Not that everyone would implement a UA this way, but it seems like
in general you would want some extension mechanism.)

Keith