Ned,
Your explanation of the reasons for the separate character set header
was very helpful. I think I now understand enough of what is going on
to have an opinion (which is always dangerous). It seems that there are
two positions:
* Build a separate header so that you absolutely and for certain know
where to find the character set info because, when it is meaningful, you
need to pull it out without understanding how to parse subtypes.
* Keep the information in the subtypes to avoid a number of odd and/or
nasty and/or meaningless situations.
Now, this may be naive, but couldn't *both* goals be accomplished by
being a tad more clever about the content-type syntax? What we have
now, more or less, is
type/subtype-and-stuff
where the "and stuff" consists of zero or more parameters whose content,
meaning, and order are dependent on the subtype.
What would happen if we invented some peculiar charset-delimiting syntax
and a small restriction s.t., for example:
(i) If a character set designation appeared, it would be the first
parameter for the subtype or the subtype itself.
(ii) Some special syntax or designation were used for that parameter/
subtype which would permit one to lexically determine whether the thing
was a character set designator, even if the subtype itself were not
recognized.
I don't have RFC-XXXX in front of me and don't recall the current syntax
proposal, much less the Atlanta conclusions,
but notions like
text/=ISO_8859-1
text-plus/troff =ISO_8859-6;...
come to mind. In that model,
foo/bar =abcdef Identifies a character set, without requiring
you to know the "meaning" of either "foo" or "bar"
and
xxx/yyy zzz Does not identify a character set, no matter
what.
Whether or not the character set designation was permitted would then be
a function of type and subtype definitions in the same header.
It is, granted, a little odd, and an "=" as an introducer is probably
not the best model, but wouldn't this permit having one's cake and
eating it too? And it would avoid a lot of robustness problems, e.g.,
if some idiot provided two Content-type fields, you might have to guess
which one to ignore, but you at least wouldn't have to guess which one
was bound to the one (or two) character set fields.
john